shuffle

Module: missions.operations.shuffle

Take combinations of datasets in the summary for training and test each

The input of this operation has to contain several comparable datasets of the same type. Depending on whether the input datasets contain split data, the behavior of this operation differs slightly.

Note

This operation creates an output directory with links, not duplicated files!

If the input datasets are not split, the result of this operation contains one dataset for every pair of datasets of the input_path. For instance, if the input consists of the three datasets “A”, “B”, “C”, the result will at least contain the 6 datasets “A_vs_B”, “A_vs_C”, “B_vs_A”, “B_vs_C, “C_vs_A”, “C_vs_B”. The result dataset “A_vs_B” uses the feature vectors from dataset “A” as training data and the feature vectors from dataset “B” as test data.

If the input datasets contain split data, additionally the input datasets are copied to the result directory so that this would contain 9 datasets. The dataset “X_vs_Y” contains the train data from dataset X from the respective split for training and the test data from dataset Y for testing.

A typical operation specification file might look like this

Specification file Parameters

type

Has to be set to shuffle to use this operation!

(obligatory, shuffle)

input_path

Location of the input data

(obligatory)

dataset_constraints

Optionally, constraints can be passed to the operation that specify which datasets are combined based on the dataset name. For instance, the constraint ‘”%(dataset_name1)s”.strip(“}{”).split(“}{”)[1:] == “%(dataset_name2)s”.strip(“}{”).split(“}{”)[1:]’ would cause that only datasets are combined, that were created by the same preprocessing with the same parameterization.

(optional, default: [])

Exemplary Call

type: shuffle
input_path: "operation_results/2009_8_13_15_8_57"
dataset_constraints:
  # Combine only datasets that have been created using the same parameterization
  - '"%(dataset_name1)s".strip("}{").split("}{")[1:] == "%(dataset_name2)s".strip("}{").split("}{")[1:]' 

Inheritance diagram for pySPACE.missions.operations.shuffle:

Inheritance diagram of pySPACE.missions.operations.shuffle

Class Summary

ShuffleOperation(processes, operation_spec, ...) Forwards processing to process
ShuffleProcess(input_dataset, ...) The shuffle process

Classes

ShuffleOperation

class pySPACE.missions.operations.shuffle.ShuffleOperation(processes, operation_spec, result_directory, number_processes, create_process=None)[source]

Bases: pySPACE.missions.operations.base.Operation

Forwards processing to process

Class Components Summary

_createProcesses(processes, ...) Function that creates the shuffle process.
consolidate() Consolidation of the operation’s results
create(operation_spec, result_directory[, ...]) Factory method that creates a ShuffleOperation
__init__(processes, operation_spec, result_directory, number_processes, create_process=None)[source]
classmethod create(operation_spec, result_directory, debug=False, input_paths=[])[source]

Factory method that creates a ShuffleOperation

A factory method that creates a ShuffleOperation based on the information given in the operation specification operation_spec

classmethod _createProcesses(processes, result_directory, input_datasets, dataset_constraints)[source]

Function that creates the shuffle process.

Create the ShuffleProcess (shuffling is not distributed over different processes)

consolidate()[source]

Consolidation of the operation’s results

ShuffleProcess

class pySPACE.missions.operations.shuffle.ShuffleProcess(input_dataset, result_directory, dataset_constraints)[source]

Bases: pySPACE.missions.operations.base.Process

The shuffle process

Combines datasets that fulfill all dataset_constraints

Class Components Summary

__call__() Executes this process on the respective modality
_copy_arff_file(input_arff_file_name, ...) Copy the arff files and adjust the relation name in the arff file
__init__(input_dataset, result_directory, dataset_constraints)[source]
__call__()[source]

Executes this process on the respective modality

_copy_arff_file(input_arff_file_name, target_arff_file_name, input_dataset_name, target_dataset_name)[source]

Copy the arff files and adjust the relation name in the arff file