transfer_splitter

Module: missions.nodes.splitter.transfer_splitter

Splits data into training and test data

Inheritance diagram for pySPACE.missions.nodes.splitter.transfer_splitter:

Inheritance diagram of pySPACE.missions.nodes.splitter.transfer_splitter

TransferSplitterNode

class pySPACE.missions.nodes.splitter.transfer_splitter.TransferSplitterNode(wdefs_train, wdefs_test, split_method, wdefs_train_test=None, splits=10, random=False, num_split_instances=None, wdefs_split=None, reverse=False, sort=False, *args, **kwargs)[source]

Bases: pySPACE.missions.nodes.base_node.BaseNode

Allow to split data into training and test data sets according to different window definitions

Splits the available data into disjunct training and test sets. The transfer of different training and test window definitions is supported. The node was implemented with several use cases in mind:

  • The training set contains instances of ‘Standard’ and ‘Target’ stimuli but the test set of ‘Target’ and ‘MissedTarget’ stimuli.
  • The training set contains instances of ‘LRP’ with different training times and ‘NoLRPs’, but the test set should contain sliding windows. Cross validation should be supported to use the node together with parameter optimization node.
  • The use of merged data sets should be possible.

Parameters

wdefs_train:

A list with window definition names (specified in the window spec file when the raw data was segmented). All windows that belong to one of the window definition are considered when the training set(s) is(/are) determined.

wdefs_test:

A list with window definition names (specified in the window spec file when the raw data was segmented). All windows that belong to one of the window definition are considered when the testing set(s) is(/are) determined.

split_method:

One of the following Strings: ‘all_data’, ‘time’, ‘count’, ‘set_flag’.

  • all_data :
    All possible data is used in every split. This results in splitting only window definitions that occur in both, wdefs_train AND wdefs_test. Window definitions that only occur in either wdefs_train or wdefs_test are retained in every split.
  • time :
    The data is sorted and split according to time. For that (start_time of last window - start_time of first window)/nr_of_splits) is determined. Since time in eeg data is relative for every set, ensure that each input collection consists only of one data set (is not a merge of several sets) or that the change_option has been used.
  • count :
    The data is split according to num_split_instances. By default only windows specified in both, wdefs_train and wdefs_test, are count. With the parameter wdefs_split window definition that are count can be specified. If num_split_instances is not specified, splits determines how many instances of wdefs_split are in one split.
  • set_flag :
    When the data has been merged with the concatenate operation before, a flag ‘new_set’ has been inserted to the time series specs. Splits are based on this flag, i.e. the splits behave like a inter-set cross validation. For example you merged 3 sets: ‘A’, ‘B’, ‘C’, then there are 3 splits generated: ‘A’+’B’ vs ‘C’, ‘A’+’C’ vs ‘B’ and ‘B’+’C’ vs ‘A’.
  • set_flag_reverse:
    When the data has been merged with the concatenate operation before, a flag ‘new_set’ has been inserted to the time series specs. Splits are based on this flag, i.e. the splits behave like a reverse-inter-set cross validation. For example you merged 3 sets: ‘A’, ‘B’, ‘C’, then there are 3 splits generated: ‘A’ vs ‘C’+’B’, ‘B’ vs ‘A’+’C’ and ‘C’ vs ‘A’+’B’.
random:

If True, the data is randomized before splitting.

Note

It is not guaranteed that overlapping windows will be in the same split for split methods ‘time’ and ‘all_data’!

(optional, default: False)

splits:

The number of splits created internally and the number of train-test pairs.

(optional, default: 10)

num_split_instances:
 

If split_method is ‘count’, num_split_instances specifies how many instances will be in one split. After splitting one split is evaluated according to wdefs_test for the test data set and the remaining splits according to wdefs_train. The test split is iterated. If the total number of instances that are count is not divisible by num_split_instances the last split will contain the remaining instances. If in addition splits is set to 1, only one train-test pair is created with num_split_instances in the training set.

(optional, default: None)

wdefs_split:

A list with window definition names (specified in the window spec file when the raw data was segmented). All windows that belong to one of the window definition are counted when split_method was set to ‘count’.

(optional, default: None)

reverse:

If this option is True, the data is split in reverse ordering.

(optional, default: False)

Exemplary Call

-
    node : TransferSplitter
    parameters :
        wdefs_train : ['s2', 's1']
        wdefs_test : ['s5', 's2']
        split_method : "all_data"
        splits : 5
Author:

Anett Seeland (anett.seeland@dfki.de)

Created:

2011/04/10

LastChange:

2011/11/14 (traintest functionality)

POSSIBLE NODE NAMES:
 
  • TransferSplitterNode
  • TransferSplitter
POSSIBLE INPUT TYPES:
 
  • PredictionVector
  • FeatureVector
  • TimeSeries

Class Components Summary

_create_split() Create the split of the data into training and test data.
input_types
is_split_node() Returns whether this is a split node.
request_data_for_testing()
request_data_for_training(use_test_data)
train_sweep(use_test_data) Performs the actual training of the node.
use_next_split() Use the next split of the data into training and test data.
__init__(wdefs_train, wdefs_test, split_method, wdefs_train_test=None, splits=10, random=False, num_split_instances=None, wdefs_split=None, reverse=False, sort=False, *args, **kwargs)[source]
is_split_node()[source]

Returns whether this is a split node.

use_next_split()[source]

Use the next split of the data into training and test data.

Returns True if more splits are available, otherwise False.

This method is useful for benchmarking

train_sweep(use_test_data)[source]

Performs the actual training of the node.

Note

Split nodes cannot be trained

request_data_for_training(use_test_data)[source]
request_data_for_testing()[source]
_create_split()[source]

Create the split of the data into training and test data.

input_types = ['PredictionVector', 'FeatureVector', 'TimeSeries']