traintest_splitter¶
Module: missions.nodes.splitter.traintest_splitter
¶
Split data into one training and one test data set with restriction like randomization or fixed percentages
Inheritance diagram for pySPACE.missions.nodes.splitter.traintest_splitter
:
TrainTestSplitterNode
¶
-
class
pySPACE.missions.nodes.splitter.traintest_splitter.
TrainTestSplitterNode
(train_ratio=0.5, random=True, num_train_instances=None, class_label='Target', reverse=False, **kwargs)[source]¶ Bases:
pySPACE.missions.nodes.base_node.BaseNode
Split data into one training and one test data set with a fixed ratio
The relative size of the two sets is controlled via the parameter train_ratio.
Warning
the class ratio is not retained
Parameters
train_ratio: The ratio of the overall available data that is assigned to the training set. The remaining data (1-train_ratio) is used for testing.
(optional, default: 0.5)
num_train_instances: Instead of specifying a train_ratio, this option allows to specify the absolute number of training instances of class class_label that should be in the training set. All instances that occur until num_train_instances are found are used for training. The remaining data are used for testing.
(optional, default: None)
class_label: If num_train_instances-option is used, this string determines the class of which training examples are count.
random: If False, the order of the data is retained. I.e. the train_ratio instances are used for training and the remaining as test data. If True, the two sets are sampled randomly from the data without taking into consideration the data’s order.
(optional, default: True)
Exemplary Call
- node : TrainTestSplitter parameters : train_ratio : 0.7 random : False
Author: Jan Hendrik Metzen (jhm@informatik.uni-bremen.de)
Created: 2010/03/08 (Documentation, old node)
LastChange: 2011/11/14 (Documentation) Anett Seeland
POSSIBLE NODE NAMES: - TrainTestSplitterNode
- TrainTestSplitter
POSSIBLE INPUT TYPES: - PredictionVector
- FeatureVector
- TimeSeries
Class Components Summary
_create_split
()Create the split of the data into training and test data. input_types
is_split_node
()Returns whether this is a split node. request_data_for_testing
()Returns the data for testing of subsequent nodes request_data_for_training
(use_test_data)Returns the data for training of subsequent nodes train_sweep
(use_test_data)Performs the actual training of the node. use_next_split
()Use the next split of the data into training and test data. -
__init__
(train_ratio=0.5, random=True, num_train_instances=None, class_label='Target', reverse=False, **kwargs)[source]¶
-
use_next_split
()[source]¶ Use the next split of the data into training and test data.
Returns True if more splits are available, otherwise False.
This method is useful for benchmarking
-
train_sweep
(use_test_data)[source]¶ Performs the actual training of the node.
Note
Split nodes cannot be trained
-
input_types
= ['PredictionVector', 'FeatureVector', 'TimeSeries']¶