consume_training_data

Module: missions.nodes.meta.consume_training_data

Splits training data for internal usage and usage of successor nodes

Inheritance diagram for pySPACE.missions.nodes.meta.consume_training_data:

Inheritance diagram of pySPACE.missions.nodes.meta.consume_training_data

ConsumeTrainingDataNode

class pySPACE.missions.nodes.meta.consume_training_data.ConsumeTrainingDataNode(wrapped_node, consumption_rate, random_seed=0, *args, **kwargs)[source]

Bases: pySPACE.missions.nodes.base_node.BaseNode

Split training data for internal usage and usage of successor nodes

This node allows to handle situations where some model needs to be trained and later on evaluated on the given training data (using test data may not be allowed for certain reasons). Simply training and evaluating the model on the same data is not an option, since the evaluation would have a strong optimistic bias (model is well adapted to the data it was trained on).

One example of such a situation is when a node chain is trained on the data that should be combined later on with an ensemble of node chains trained on historic data. The ensemble training should not happen on the same data as training.

This node therefore splits the training data into two parts: one for internal use (training the model) and one for usage of successor nodes (model evaluation). The ratio of training data that should be used internally can be controlled with the argument consumption_rate (a value between 0.0 and 1.0).

Note

When defining this node in the pySPACE YAML syntax, “wrapped_node” can be the definition of a node in YAML syntax (see below). The node object is then created automatically based on this definition.

Parameters

wrapped_node:The node that is trained with the internally used training data.
consumption_rate:
 The rate of training data that is used internally for training wrapped_node. The remaining data is supplied for the successor nodes.
random_seed:The seed of the random generator. Defaults to 0.

Exemplary Call

-
    node: ConsumeTrainingData
    parameters : 
         consumption_rate : 0.8
         wrapped_node : 
              node : Flow_Node
              parameters :
                   input_dim : 64
                   output_dim : 1
                   nodes : 
                      ......
Author:

Jan Hendrik Metzen (jhm@informatik.uni-bremen.de)

Created:

2010/08/06

POSSIBLE NODE NAMES:
 
  • ConsumeTrainingData
  • ConsumeTrainingDataNode
POSSIBLE INPUT TYPES:
 
  • PredictionVector
  • FeatureVector
  • TimeSeries

Class Components Summary

_execute(data) Executes the node on the given data vector data
_get_train_set([use_test_data]) Returns the data that can be used for training
_stop_training() Finish the training of the node.
_train(data, label) Trains the wrapped nodes on the given data vector data
get_output_type(input_type[, as_string]) Return the output type
input_types
is_supervised() Returns whether this node requires supervised training
is_trainable() Returns whether this node is trainable.
node_from_yaml(node_spec) Creates a node based on the node_spec to overwrite default
request_data_for_training(use_test_data) Returns data for training of subsequent nodes
store_state(result_dir[, index]) Stores this node in the given directory result_dir
__init__(wrapped_node, consumption_rate, random_seed=0, *args, **kwargs)[source]
static node_from_yaml(node_spec)[source]

Creates a node based on the node_spec to overwrite default

is_trainable()[source]

Returns whether this node is trainable.

is_supervised()[source]

Returns whether this node requires supervised training

_get_train_set(use_test_data=False)[source]

Returns the data that can be used for training

request_data_for_training(use_test_data)[source]

Returns data for training of subsequent nodes

_train(data, label)[source]

Trains the wrapped nodes on the given data vector data

_stop_training()[source]

Finish the training of the node.

_execute(data)[source]

Executes the node on the given data vector data

store_state(result_dir, index=None)[source]

Stores this node in the given directory result_dir

get_output_type(input_type, as_string=True)[source]

Return the output type

The method calls the corresponding method in the wrapped node

input_types = ['PredictionVector', 'FeatureVector', 'TimeSeries']