transfer_splitter¶
Module: missions.nodes.splitter.transfer_splitter
¶
Splits data into training and test data
Inheritance diagram for pySPACE.missions.nodes.splitter.transfer_splitter
:
TransferSplitterNode
¶
-
class
pySPACE.missions.nodes.splitter.transfer_splitter.
TransferSplitterNode
(wdefs_train, wdefs_test, split_method, wdefs_train_test=None, splits=10, random=False, num_split_instances=None, wdefs_split=None, reverse=False, sort=False, *args, **kwargs)[source]¶ Bases:
pySPACE.missions.nodes.base_node.BaseNode
Allow to split data into training and test data sets according to different window definitions
Splits the available data into disjunct training and test sets. The transfer of different training and test window definitions is supported. The node was implemented with several use cases in mind:
- The training set contains instances of ‘Standard’ and ‘Target’ stimuli but the test set of ‘Target’ and ‘MissedTarget’ stimuli.
- The training set contains instances of ‘LRP’ with different training times and ‘NoLRPs’, but the test set should contain sliding windows. Cross validation should be supported to use the node together with parameter optimization node.
- The use of merged data sets should be possible.
Parameters
wdefs_train: A list with window definition names (specified in the window spec file when the raw data was segmented). All windows that belong to one of the window definition are considered when the training set(s) is(/are) determined.
wdefs_test: A list with window definition names (specified in the window spec file when the raw data was segmented). All windows that belong to one of the window definition are considered when the testing set(s) is(/are) determined.
split_method: One of the following Strings: ‘all_data’, ‘time’, ‘count’, ‘set_flag’.
- all_data :
- All possible data is used in every split. This results in splitting only window definitions that occur in both, wdefs_train AND wdefs_test. Window definitions that only occur in either wdefs_train or wdefs_test are retained in every split.
- time :
- The data is sorted and split according to time. For that (start_time of last window - start_time of first window)/nr_of_splits) is determined. Since time in eeg data is relative for every set, ensure that each input collection consists only of one data set (is not a merge of several sets) or that the change_option has been used.
- count :
- The data is split according to num_split_instances. By default only windows specified in both, wdefs_train and wdefs_test, are count. With the parameter wdefs_split window definition that are count can be specified. If num_split_instances is not specified, splits determines how many instances of wdefs_split are in one split.
- set_flag :
- When the data has been merged with the concatenate operation before, a flag ‘new_set’ has been inserted to the time series specs. Splits are based on this flag, i.e. the splits behave like a inter-set cross validation. For example you merged 3 sets: ‘A’, ‘B’, ‘C’, then there are 3 splits generated: ‘A’+’B’ vs ‘C’, ‘A’+’C’ vs ‘B’ and ‘B’+’C’ vs ‘A’.
- set_flag_reverse:
- When the data has been merged with the concatenate operation before, a flag ‘new_set’ has been inserted to the time series specs. Splits are based on this flag, i.e. the splits behave like a reverse-inter-set cross validation. For example you merged 3 sets: ‘A’, ‘B’, ‘C’, then there are 3 splits generated: ‘A’ vs ‘C’+’B’, ‘B’ vs ‘A’+’C’ and ‘C’ vs ‘A’+’B’.
random: If True, the data is randomized before splitting.
Note
It is not guaranteed that overlapping windows will be in the same split for split methods ‘time’ and ‘all_data’!
(optional, default: False)
splits: The number of splits created internally and the number of train-test pairs.
(optional, default: 10)
num_split_instances: If split_method is ‘count’, num_split_instances specifies how many instances will be in one split. After splitting one split is evaluated according to wdefs_test for the test data set and the remaining splits according to wdefs_train. The test split is iterated. If the total number of instances that are count is not divisible by num_split_instances the last split will contain the remaining instances. If in addition splits is set to 1, only one train-test pair is created with num_split_instances in the training set.
(optional, default: None)
wdefs_split: A list with window definition names (specified in the window spec file when the raw data was segmented). All windows that belong to one of the window definition are counted when split_method was set to ‘count’.
(optional, default: None)
reverse: If this option is True, the data is split in reverse ordering.
(optional, default: False)
Exemplary Call
- node : TransferSplitter parameters : wdefs_train : ['s2', 's1'] wdefs_test : ['s5', 's2'] split_method : "all_data" splits : 5
Author: Anett Seeland (anett.seeland@dfki.de)
Created: 2011/04/10
LastChange: 2011/11/14 (traintest functionality)
POSSIBLE NODE NAMES: - TransferSplitterNode
- TransferSplitter
POSSIBLE INPUT TYPES: - PredictionVector
- FeatureVector
- TimeSeries
Class Components Summary
_create_split
()Create the split of the data into training and test data. input_types
is_split_node
()Returns whether this is a split node. request_data_for_testing
()request_data_for_training
(use_test_data)train_sweep
(use_test_data)Performs the actual training of the node. use_next_split
()Use the next split of the data into training and test data. -
__init__
(wdefs_train, wdefs_test, split_method, wdefs_train_test=None, splits=10, random=False, num_split_instances=None, wdefs_split=None, reverse=False, sort=False, *args, **kwargs)[source]¶
-
use_next_split
()[source]¶ Use the next split of the data into training and test data.
Returns True if more splits are available, otherwise False.
This method is useful for benchmarking
-
train_sweep
(use_test_data)[source]¶ Performs the actual training of the node.
Note
Split nodes cannot be trained
-
input_types
= ['PredictionVector', 'FeatureVector', 'TimeSeries']¶