merge¶
Module: missions.operations.merge
¶
Define train and test data for One versus Rest or Rest versus One in cross validation fashion
The result summary of this operation contains one dataset for every dataset of the input_path, which uses data from this dataset as test data and the data of all other datasets as training data. For instance, if the input consists of the three datasets “A”, “B”, “C”, the result summary will contain the 3 datasets “Rest_vs_A”, “Rest_vs_B”, and “Rest_vs_C”. The result dataset “Rest_vs_A” uses the data from dataset “A” as test data and the data from all other datasets as train data. If reverse is True this will result in the 3 datasets “A_vs_Rest”, “B_vs_Rest” and “C_vs_Rest”.
Specification file Parameters¶
input_path¶
The input path of this operation has to contain several datasets of one of
the types time_series
or
feature_vector
.
The input datasets must not contain split data.
(obligatory)
name_pattern¶
String to customize ‘Rest’ in the name of the result dataset.
(optional, default: ‘Rest’)
set_flag¶
If set_flag is True, the first time series object, that is merged from a different dataset, gets a flag. This is only done if data is stored in pickle format.
(optional, default: True)
collection_constraints¶
Optionally, constraints can be passed to the operation that specify which datasets are used as training data for which test data. For instance, the constraint ‘”%(source_train_collection_name)s”.strip(“}{”).split(“}{”)[1:] == “%(source_test_collection_name)s”.strip(“}{”).split(“}{”)[1:]’ would cause that only datasets are combined that were created by the same processing with the same parametrization.
(optional, default: [])
Exemplary Call¶
type: merge
input_path: "operation_results/2009_8_13_15_8_57"
reverse: False
collection_constraints:
# Combine only collections that have been created using the same parameterization
- '"%(source_train_collection_name)s".strip("}{").split("}{")[1:] == "%(source_test_collection_name)s".strip("}{").split("}{")[1:]'
Inheritance diagram for pySPACE.missions.operations.merge
:
Class Summary¶
MergeOperation (processes, operation_spec, ...) |
Operation to create ‘All_vs_One’ datasets |
MergeProcess (input_collection, ...) |
Create ‘All_vs_One’ collections where ‘All’ are all collections that fulfill the collection_constraints and are different from the “One” collection |
Classes¶
MergeOperation
¶
-
class
pySPACE.missions.operations.merge.
MergeOperation
(processes, operation_spec, result_directory, number_processes, create_process=None)[source]¶ Bases:
pySPACE.missions.operations.base.Operation
Operation to create ‘All_vs_One’ datasets
Class Components Summary
_createProcesses
(processes, ...)[factory method] Create the MergeProcess object. consolidate
()Consolidation of the operation’s results create
(operation_spec, result_directory[, ...])[factory method] Create a MergeOperation object. -
__init__
(processes, operation_spec, result_directory, number_processes, create_process=None)[source]¶
-
classmethod
create
(operation_spec, result_directory, debug=False, input_paths=[])[source]¶ [factory method] Create a MergeOperation object.
A factory method that creates a MergeOperation based on the information given in the operation specification operation_spec
-
MergeProcess
¶
-
class
pySPACE.missions.operations.merge.
MergeProcess
(input_collection, result_directory, collection_constraints, reverse, set_flag, name_pattern)[source]¶ Bases:
pySPACE.missions.operations.base.Process
Create ‘All_vs_One’ collections where ‘All’ are all collections that fulfill the collection_constraints and are different from the “One” collection
Restricted to pickle and arff files!
Class Components Summary
__call__
()Executes this process on the respective modality _copy_arff_file
(input_arff_file_path, ...)Copy the arff files and adjust the relation name in the arff file _copy_file
(source_collection_path, ...)Copy a dataset to a new destination _merge_arff_files
(target_arff_file_path, ...)Copy the instances from the merge arff files to the target arff file _merge_files
(target_collection_path, ...)Merge all collections in source_collection_pathes and store them in the target collection -
__init__
(input_collection, result_directory, collection_constraints, reverse, set_flag, name_pattern)[source]¶
-
_merge_arff_files
(target_arff_file_path, merge_arff_file_pathes, target_collection_name)[source]¶ Copy the instances from the merge arff files to the target arff file
-
_copy_arff_file
(input_arff_file_path, target_arff_file_path, input_collection_name, target_collection_name)[source]¶ Copy the arff files and adjust the relation name in the arff file
-
_merge_files
(target_collection_path, source_collection_pathes, train_set_name_suffix, target_collection_params)[source]¶ Merge all collections in source_collection_pathes and store them in the target collection
Parameters
target_collection_path: Path of the dataset, in which the data of all other datasets is assembled. source_collection_pathes: Paths of the datasets to be merged. train_set_name_suffix: Either ‘train’ or ‘test’. Specifies if datasets are merged for training or testing. target_collection_params: Dictionary with all the parameters of the target dataset.
-
_copy_file
(source_collection_path, target_collection_path, train_set_name_suffix)[source]¶ Copy a dataset to a new destination
Parameters
source_collection_path: The path to the dataset that has to be copied. target_collection_path: The path to where the dataset should be copied. train_set_name_suffix: Either ‘train’ or ‘test’. Specifies if the target dataset is handeled as training or testing data.
-