cv_splitter¶
Module: missions.nodes.splitter.cv_splitter
¶
Create splits of the data into train and test data used for cross-validation
Inheritance diagram for pySPACE.missions.nodes.splitter.cv_splitter
:
CrossValidationSplitterNode
¶
-
class
pySPACE.missions.nodes.splitter.cv_splitter.
CrossValidationSplitterNode
(splits=10, stratified=True, random=True, time_dependent=False, stratified_class=None, *args, **kwargs)[source]¶ Bases:
pySPACE.missions.nodes.base_node.BaseNode
Perform (stratified) cross-validation
During benchmarking, n pairs of training and test data are generated, where n is configurable via the parameter splits. The n test datasets are pairwise disjunct. Internally, the available data is partitioned into n pairwise disjunct sets s_1, ..., s_n of equal size (the “splits”). The i-th pair of training and test data is generated by using s_i as test data and the union of the remaining datasets as training data.
The partitioning is stratified per default, i.e. the splits have the same class ratio as the overall dataset. Per default, the partitioning is based on shuffling the data randomly. In this case, the partitioning of the data into s_1, ..., s_n is determined solely based on the run number (used as random seed), yielding the same split for the same run_number and different ones for two different run_numbers.
Parameters
splits: The number of splits created internally. If n data points exist and m splits are created, each of these splits consists of approx. m/n data points.
(optional, default: 10)
stratified: If true, the cross-validation is stratified, i.e. the overall class-ratio is retained in each split (as good as possible).
(optional, default: True)
random: If true, the order of the data is randomly shuffled.
(optional, default: True)
time_dependent: If True splitting is done separately for different (= not overlapping) time windows to ensure that instances corresponding to the same marker will be in the same split.
Note
Stratification is only allowed here if there is only one class label for one marker.
(optional, default: False)
stratified_class: If time_dependent is True and stratified_class is specified stratification is only done for the specified class label (String). The other class is filling the split preserving the time order of the data. This also means that random has no effect here.
(optional, default: None)
Exemplary Call
- node : CV_Splitter parameters : splits : 10 stratified : True
Author: Jan Hendrik Metzen (jhm@informatik.uni-bremen.de)
Created: 2008/12/16
POSSIBLE NODE NAMES: - CrossValidationSplitterNode
- CrossValidationSplitter
- CV_Splitter
POSSIBLE INPUT TYPES: - PredictionVector
- FeatureVector
- TimeSeries
Class Components Summary
__hyperparameters
_create_splits
()Create the split of the data for n-fold cross-validation input_types
is_split_node
()Return whether this is a split node request_data_for_testing
()Returns the data for testing of subsequent nodes request_data_for_training
(use_test_data)Returns the data for training of subsequent nodes train_sweep
(use_test_data)Performs the actual training of the node. use_next_split
()Use the next split of the data into training and test data. -
__init__
(splits=10, stratified=True, random=True, time_dependent=False, stratified_class=None, *args, **kwargs)[source]¶
-
use_next_split
()[source]¶ Use the next split of the data into training and test data.
Returns True if more splits are available, otherwise False.
This method is useful for benchmarking
-
train_sweep
(use_test_data)[source]¶ Performs the actual training of the node.
Note
Split nodes cannot be trained
-
__hyperparameters
= set([NoOptimizationParameter<input_dim>, NoOptimizationParameter<dtype>, NoOptimizationParameter<output_dim>, NoOptimizationParameter<random>, NoOptimizationParameter<retrain>, NoOptimizationParameter<time_dependent>, NoOptimizationParameter<kwargs_warning>, NoOptimizationParameter<store>, NoOptimizationParameter<stratified>])¶
-
input_types
= ['PredictionVector', 'FeatureVector', 'TimeSeries']¶