traintest_splitter

Module: missions.nodes.splitter.traintest_splitter

Split data into one training and one test data set with restriction like randomization or fixed percentages

Inheritance diagram for pySPACE.missions.nodes.splitter.traintest_splitter:

Inheritance diagram of pySPACE.missions.nodes.splitter.traintest_splitter

TrainTestSplitterNode

class pySPACE.missions.nodes.splitter.traintest_splitter.TrainTestSplitterNode(train_ratio=0.5, random=True, num_train_instances=None, class_label='Target', reverse=False, **kwargs)[source]

Bases: pySPACE.missions.nodes.base_node.BaseNode

Split data into one training and one test data set with a fixed ratio

The relative size of the two sets is controlled via the parameter train_ratio.

Warning

the class ratio is not retained

Parameters

train_ratio:

The ratio of the overall available data that is assigned to the training set. The remaining data (1-train_ratio) is used for testing.

(optional, default: 0.5)

num_train_instances:
 

Instead of specifying a train_ratio, this option allows to specify the absolute number of training instances of class class_label that should be in the training set. All instances that occur until num_train_instances are found are used for training. The remaining data are used for testing.

(optional, default: None)

class_label:

If num_train_instances-option is used, this string determines the class of which training examples are count.

random:

If False, the order of the data is retained. I.e. the train_ratio instances are used for training and the remaining as test data. If True, the two sets are sampled randomly from the data without taking into consideration the data’s order.

(optional, default: True)

Exemplary Call

-
    node : TrainTestSplitter
    parameters :
          train_ratio : 0.7
          random : False
Author:

Jan Hendrik Metzen (jhm@informatik.uni-bremen.de)

Created:

2010/03/08 (Documentation, old node)

LastChange:

2011/11/14 (Documentation) Anett Seeland

POSSIBLE NODE NAMES:
 
  • TrainTestSplitterNode
  • TrainTestSplitter
POSSIBLE INPUT TYPES:
 
  • PredictionVector
  • FeatureVector
  • TimeSeries

Class Components Summary

_create_split() Create the split of the data into training and test data.
input_types
is_split_node() Returns whether this is a split node.
request_data_for_testing() Returns the data for testing of subsequent nodes
request_data_for_training(use_test_data) Returns the data for training of subsequent nodes
train_sweep(use_test_data) Performs the actual training of the node.
use_next_split() Use the next split of the data into training and test data.
__init__(train_ratio=0.5, random=True, num_train_instances=None, class_label='Target', reverse=False, **kwargs)[source]
is_split_node()[source]

Returns whether this is a split node.

use_next_split()[source]

Use the next split of the data into training and test data.

Returns True if more splits are available, otherwise False.

This method is useful for benchmarking

train_sweep(use_test_data)[source]

Performs the actual training of the node.

Note

Split nodes cannot be trained

request_data_for_training(use_test_data)[source]

Returns the data for training of subsequent nodes

request_data_for_testing()[source]

Returns the data for testing of subsequent nodes

_create_split()[source]

Create the split of the data into training and test data.

input_types = ['PredictionVector', 'FeatureVector', 'TimeSeries']