instance_selection

Module: missions.nodes.data_selection.instance_selection

Select only a part of the instances

Inheritance diagram for pySPACE.missions.nodes.data_selection.instance_selection:

Inheritance diagram of pySPACE.missions.nodes.data_selection.instance_selection

Class Summary

InstanceSelectionNode([...]) Retain only a certain percentage of the instances
ReduceOverrepresentedClassNode(\*\*kwargs) Reject instances to balance categories for classification

Classes

InstanceSelectionNode

class pySPACE.missions.nodes.data_selection.instance_selection.InstanceSelectionNode(train_percentage_selected=100, test_percentage_selected=100, reduce_class=False, num_train_instances=None, class_label='Target', random=True, **kwargs)[source]

Bases: pySPACE.missions.nodes.base_node.BaseNode

Retain only a certain percentage of the instances

The node InstanceSelectionNode forwards only train_percentage_selected percent of the training instances passed to him to the successor node and only test_percentage_selected percent of the test instances. The forwarded instances are selected randomly but so that the class ratio is kept.

If reduce_class is used, only the chosen class is reduced, without keeping the class ratio. So the total mount of reduced data does not match the percentage values.

Parameters
train_percentage_selected:
 

The percentage of training instances which is forwarded to successor node.

(optional, default: 100)

test_percentage_selected:
 

The percentage of test instances which is forwarded to successor node.

(optional, default: 100)

reduce_class:

If you want only to reduce one class, choose this parameter otherwise, both classes are reduced in a balanced fashion.

(optional, default: False)

num_train_instances:
 

Instead of specifying train_percentage_selected, this option allows to specify the absolute number of training instances of class class_label that should be in the training set. All instances that occur until num_train_instances are found are used for training.

(optional, default: None)

class_label:

If num_train_instances-option is used, this string determines the class of which training examples are count.

(optional, default: ‘Target’)

random:

If False, the order of the data is retained. I.e. the first X percent or number of train instances are used for training. If True, the training data is sampled randomly without taking into consideration the data’s order.

(optional, default: True)

Exemplary call

-
    node : InstanceSelection
    parameters : 
        train_percentage_selected : 80
        test_percentage_selected : 100
        reduce_class : Standard
Author:

Jan Hendrik Metzen (jhm@informatik.uni-bremen.de)

Created:

2010/03/31

POSSIBLE NODE NAMES:
 
  • InstanceSelection
  • InstanceSelectionNode
  • RandomInstanceSelection
POSSIBLE INPUT TYPES:
 
  • PredictionVector
  • FeatureVector
  • TimeSeries

Class Components Summary

_execute(time_series)
get_num_data(iterator) Return a list of instances that contain num_train_instances many
input_types
request_data_for_testing() Returns data for testing of subsequent nodes
request_data_for_training(use_test_data) Returns data for training of subsequent nodes
__init__(train_percentage_selected=100, test_percentage_selected=100, reduce_class=False, num_train_instances=None, class_label='Target', random=True, **kwargs)[source]
get_num_data(iterator)[source]

Return a list of instances that contain num_train_instances many instances of class class_label and all other instances that occur up to this point

request_data_for_training(use_test_data)[source]

Returns data for training of subsequent nodes

Note

This method works differently in InstanceSelectionNode than in other nodes: Only percentage_selected of the available data are returned.

request_data_for_testing()[source]

Returns data for testing of subsequent nodes

_execute(time_series)[source]
input_types = ['PredictionVector', 'FeatureVector', 'TimeSeries']

ReduceOverrepresentedClassNode

class pySPACE.missions.nodes.data_selection.instance_selection.ReduceOverrepresentedClassNode(**kwargs)[source]

Bases: pySPACE.missions.nodes.base_node.BaseNode

Reject instances to balance categories for classification

The node forwards only a reduced number of the training and test instances of the bigger class to get a balanced ratio of the classes. The forwarded instances are selected randomly. All data of the underrepresented class is forwarded.

Parameters

Exemplary call

-
    node : Reduce_Overrepresented_Class
Author:

Hendrik Woehrle (hendrik.woehrle@dfki.de)

Created:

2010/09/22

POSSIBLE NODE NAMES:
 
  • ReduceOverrepresentedClass
  • Reduce_Overrepresented_Class
  • ReduceOverrepresentedClassNode
POSSIBLE INPUT TYPES:
 
  • PredictionVector
  • FeatureVector
  • TimeSeries

Class Components Summary

_execute(time_series)
balance_instances(all_instances) Method that performs the rejections of the data in the oversized class
input_types
request_data_for_testing() Returns data for testing of subsequent nodes
request_data_for_training(use_test_data) Returns data for training of subsequent nodes
__init__(**kwargs)[source]
request_data_for_training(use_test_data)[source]

Returns data for training of subsequent nodes

request_data_for_testing()[source]

Returns data for testing of subsequent nodes

_execute(time_series)[source]
balance_instances(all_instances)[source]

Method that performs the rejections of the data in the oversized class

input_types = ['PredictionVector', 'FeatureVector', 'TimeSeries']