cv_splitter¶

Module: `missions.nodes.splitter.cv_splitter`¶

Create splits of the data into train and test data used for cross-validation

Inheritance diagram for pySPACE.missions.nodes.splitter.cv_splitter:

Inheritance diagram of pySPACE.missions.nodes.splitter.cv_splitter

`CrossValidationSplitterNode`¶

class pySPACE.missions.nodes.splitter.cv_splitter.CrossValidationSplitterNode(splits=10, stratified=True, random=True, time_dependent=False, stratified_class=None, *args, **kwargs)[source]¶

Bases: pySPACE.missions.nodes.base_node.BaseNode

Perform (stratified) cross-validation

During benchmarking, n pairs of training and test data are generated, where n is configurable via the parameter splits. The n test datasets are pairwise disjunct. Internally, the available data is partitioned into n pairwise disjunct sets s_1, ..., s_n of equal size (the “splits”). The i-th pair of training and test data is generated by using s_i as test data and the union of the remaining datasets as training data.

The partitioning is stratified per default, i.e. the splits have the same class ratio as the overall dataset. Per default, the partitioning is based on shuffling the data randomly. In this case, the partitioning of the data into s_1, ..., s_n is determined solely based on the run number (used as random seed), yielding the same split for the same run_number and different ones for two different run_numbers.

Parameters

splits:
The number of splits created internally. If n data points exist and m splits are created, each of these splits consists of approx. m/n data points.

(optional, default: 10)

stratified:
If true, the cross-validation is stratified, i.e. the overall class-ratio is retained in each split (as good as possible).

(optional, default: True)

random:
If true, the order of the data is randomly shuffled.

(optional, default: True)

time_dependent:
If True splitting is done separately for different (= not overlapping) time windows to ensure that instances corresponding to the same marker will be in the same split.

Note

Stratification is only allowed here if there is only one class label for one marker.

(optional, default: False)

stratified_class:

If time_dependent is True and stratified_class is specified stratification is only done for the specified class label (String). The other class is filling the split preserving the time order of the data. This also means that random has no effect here.

(optional, default: None)

Exemplary Call

-
    node : CV_Splitter
    parameters :
          splits : 10
          stratified : True

POSSIBLE NODE NAMES:
Author:	Jan Hendrik Metzen (jhm@informatik.uni-bremen.de)
Created:	2008/12/16
	CrossValidationSplitterNode CrossValidationSplitter CV_Splitter
POSSIBLE INPUT TYPES:
	PredictionVector FeatureVector TimeSeries

Class Components Summary

`__hyperparameters`
`_create_splits`()	Create the split of the data for n-fold cross-validation
`input_types`
`is_split_node`()	Return whether this is a split node
`request_data_for_testing`()	Returns the data for testing of subsequent nodes
`request_data_for_training`(use_test_data)	Returns the data for training of subsequent nodes
`train_sweep`(use_test_data)	Performs the actual training of the node.
`use_next_split`()	Use the next split of the data into training and test data.

__init__(splits=10, stratified=True, random=True, time_dependent=False, stratified_class=None, *args, **kwargs)[source]¶

is_split_node()[source]¶: Return whether this is a split node

use_next_split()[source]¶

Use the next split of the data into training and test data.

Returns True if more splits are available, otherwise False.

This method is useful for benchmarking

train_sweep(use_test_data)[source]¶: Performs the actual training of the node.

Note

Split nodes cannot be trained

request_data_for_training(use_test_data)[source]¶: Returns the data for training of subsequent nodes

request_data_for_testing()[source]¶: Returns the data for testing of subsequent nodes

_create_splits()[source]¶: Create the split of the data for n-fold cross-validation

__hyperparameters = set([NoOptimizationParameter<input_dim>, NoOptimizationParameter<dtype>, NoOptimizationParameter<output_dim>, NoOptimizationParameter<random>, NoOptimizationParameter<retrain>, NoOptimizationParameter<time_dependent>, NoOptimizationParameter<kwargs_warning>, NoOptimizationParameter<store>, NoOptimizationParameter<stratified>])¶

input_types = ['PredictionVector', 'FeatureVector', 'TimeSeries']¶

Table Of Contents

Previous topic

Next topic

This Page

cv_splitter¶

Module: `missions.nodes.splitter.cv_splitter`¶

`CrossValidationSplitterNode`¶

stratified_class:
splits:	The number of splits created internally. If n data points exist and m splits are created, each of these splits consists of approx. m/n data points. (optional, default: 10)
stratified:	If true, the cross-validation is stratified, i.e. the overall class-ratio is retained in each split (as good as possible). (optional, default: True)
random:	If true, the order of the data is randomly shuffled. (optional, default: True)
time_dependent:	If True splitting is done separately for different (= not overlapping) time windows to ensure that instances corresponding to the same marker will be in the same split. Note Stratification is only allowed here if there is only one class label for one marker. (optional, default: False)
	If time_dependent is True and stratified_class is specified stratification is only done for the specified class label (String). The other class is filling the split preserving the time order of the data. This also means that random has no effect here. (optional, default: None)

cv_splitter¶

Module: missions.nodes.splitter.cv_splitter¶

CrossValidationSplitterNode¶

Module: `missions.nodes.splitter.cv_splitter`¶

`CrossValidationSplitterNode`¶