merge

Module: missions.operations.merge

Define train and test data for One versus Rest or Rest versus One in cross validation fashion

The result summary of this operation contains one dataset for every dataset of the input_path, which uses data from this dataset as test data and the data of all other datasets as training data. For instance, if the input consists of the three datasets “A”, “B”, “C”, the result summary will contain the 3 datasets “Rest_vs_A”, “Rest_vs_B”, and “Rest_vs_C”. The result dataset “Rest_vs_A” uses the data from dataset “A” as test data and the data from all other datasets as train data. If reverse is True this will result in the 3 datasets “A_vs_Rest”, “B_vs_Rest” and “C_vs_Rest”.

Specification file Parameters

type

Has to be set to merge to use this operation!

(obligatory, merge)

input_path

The input path of this operation has to contain several datasets of one of the types time_series or feature_vector. The input datasets must not contain split data.

(obligatory)

name_pattern

String to customize ‘Rest’ in the name of the result dataset.

(optional, default: ‘Rest’)

reverse

Switch to use One_vs_Rest scheme instead of Rest_vs_one scheme

(optional, default: False)

set_flag

If set_flag is True, the first time series object, that is merged from a different dataset, gets a flag. This is only done if data is stored in pickle format.

(optional, default: True)

collection_constraints

Optionally, constraints can be passed to the operation that specify which datasets are used as training data for which test data. For instance, the constraint ‘”%(source_train_collection_name)s”.strip(“}{”).split(“}{”)[1:] == “%(source_test_collection_name)s”.strip(“}{”).split(“}{”)[1:]’ would cause that only datasets are combined that were created by the same processing with the same parametrization.

(optional, default: [])

Exemplary Call

type: merge
input_path: "operation_results/2009_8_13_15_8_57"
reverse: False
collection_constraints:
  # Combine only collections that have been created using the same parameterization
  - '"%(source_train_collection_name)s".strip("}{").split("}{")[1:] == "%(source_test_collection_name)s".strip("}{").split("}{")[1:]'

Inheritance diagram for pySPACE.missions.operations.merge:

Inheritance diagram of pySPACE.missions.operations.merge

Class Summary

MergeOperation(processes, operation_spec, ...) Operation to create ‘All_vs_One’ datasets
MergeProcess(input_collection, ...) Create ‘All_vs_One’ collections where ‘All’ are all collections that fulfill the collection_constraints and are different from the “One” collection

Classes

MergeOperation

class pySPACE.missions.operations.merge.MergeOperation(processes, operation_spec, result_directory, number_processes, create_process=None)[source]

Bases: pySPACE.missions.operations.base.Operation

Operation to create ‘All_vs_One’ datasets

Class Components Summary

_createProcesses(processes, ...) [factory method] Create the MergeProcess object.
consolidate() Consolidation of the operation’s results
create(operation_spec, result_directory[, ...]) [factory method] Create a MergeOperation object.
__init__(processes, operation_spec, result_directory, number_processes, create_process=None)[source]
classmethod create(operation_spec, result_directory, debug=False, input_paths=[])[source]

[factory method] Create a MergeOperation object.

A factory method that creates a MergeOperation based on the information given in the operation specification operation_spec

classmethod _createProcesses(processes, input_collections, result_directory, collection_constraints, reverse, set_flag, name_pattern)[source]

[factory method] Create the MergeProcess object.

consolidate()[source]

Consolidation of the operation’s results

MergeProcess

class pySPACE.missions.operations.merge.MergeProcess(input_collection, result_directory, collection_constraints, reverse, set_flag, name_pattern)[source]

Bases: pySPACE.missions.operations.base.Process

Create ‘All_vs_One’ collections where ‘All’ are all collections that fulfill the collection_constraints and are different from the “One” collection

Restricted to pickle and arff files!

Class Components Summary

__call__() Executes this process on the respective modality
_copy_arff_file(input_arff_file_path, ...) Copy the arff files and adjust the relation name in the arff file
_copy_file(source_collection_path, ...) Copy a dataset to a new destination
_merge_arff_files(target_arff_file_path, ...) Copy the instances from the merge arff files to the target arff file
_merge_files(target_collection_path, ...) Merge all collections in source_collection_pathes and store them in the target collection
__init__(input_collection, result_directory, collection_constraints, reverse, set_flag, name_pattern)[source]
__call__()[source]

Executes this process on the respective modality

_merge_arff_files(target_arff_file_path, merge_arff_file_pathes, target_collection_name)[source]

Copy the instances from the merge arff files to the target arff file

_copy_arff_file(input_arff_file_path, target_arff_file_path, input_collection_name, target_collection_name)[source]

Copy the arff files and adjust the relation name in the arff file

_merge_files(target_collection_path, source_collection_pathes, train_set_name_suffix, target_collection_params)[source]

Merge all collections in source_collection_pathes and store them in the target collection

Parameters

target_collection_path:
 Path of the dataset, in which the data of all other datasets is assembled.
source_collection_pathes:
 Paths of the datasets to be merged.
train_set_name_suffix:
 Either ‘train’ or ‘test’. Specifies if datasets are merged for training or testing.
target_collection_params:
 Dictionary with all the parameters of the target dataset.
_copy_file(source_collection_path, target_collection_path, train_set_name_suffix)[source]

Copy a dataset to a new destination

Parameters

source_collection_path:
 The path to the dataset that has to be copied.
target_collection_path:
 The path to where the dataset should be copied.
train_set_name_suffix:
 Either ‘train’ or ‘test’. Specifies if the target dataset is handeled as training or testing data.