instance_selection¶
Module: missions.nodes.data_selection.instance_selection
¶
Select only a part of the instances
Inheritance diagram for pySPACE.missions.nodes.data_selection.instance_selection
:
Class Summary¶
InstanceSelectionNode ([...]) |
Retain only a certain percentage of the instances |
ReduceOverrepresentedClassNode (\*\*kwargs) |
Reject instances to balance categories for classification |
Classes¶
InstanceSelectionNode
¶
-
class
pySPACE.missions.nodes.data_selection.instance_selection.
InstanceSelectionNode
(train_percentage_selected=100, test_percentage_selected=100, reduce_class=False, num_train_instances=None, class_label='Target', random=True, **kwargs)[source]¶ Bases:
pySPACE.missions.nodes.base_node.BaseNode
Retain only a certain percentage of the instances
The node InstanceSelectionNode forwards only train_percentage_selected percent of the training instances passed to him to the successor node and only test_percentage_selected percent of the test instances. The forwarded instances are selected randomly but so that the class ratio is kept.
If reduce_class is used, only the chosen class is reduced, without keeping the class ratio. So the total mount of reduced data does not match the percentage values.
- Parameters
train_percentage_selected: The percentage of training instances which is forwarded to successor node.
(optional, default: 100)
test_percentage_selected: The percentage of test instances which is forwarded to successor node.
(optional, default: 100)
reduce_class: If you want only to reduce one class, choose this parameter otherwise, both classes are reduced in a balanced fashion.
(optional, default: False)
num_train_instances: Instead of specifying train_percentage_selected, this option allows to specify the absolute number of training instances of class class_label that should be in the training set. All instances that occur until num_train_instances are found are used for training.
(optional, default: None)
class_label: If num_train_instances-option is used, this string determines the class of which training examples are count.
(optional, default: ‘Target’)
random: If False, the order of the data is retained. I.e. the first X percent or number of train instances are used for training. If True, the training data is sampled randomly without taking into consideration the data’s order.
(optional, default: True)
Exemplary call
- node : InstanceSelection parameters : train_percentage_selected : 80 test_percentage_selected : 100 reduce_class : Standard
Author: Jan Hendrik Metzen (jhm@informatik.uni-bremen.de)
Created: 2010/03/31
POSSIBLE NODE NAMES: - InstanceSelection
- InstanceSelectionNode
- RandomInstanceSelection
POSSIBLE INPUT TYPES: - PredictionVector
- FeatureVector
- TimeSeries
Class Components Summary
_execute
(time_series)get_num_data
(iterator)Return a list of instances that contain num_train_instances many input_types
request_data_for_testing
()Returns data for testing of subsequent nodes request_data_for_training
(use_test_data)Returns data for training of subsequent nodes -
__init__
(train_percentage_selected=100, test_percentage_selected=100, reduce_class=False, num_train_instances=None, class_label='Target', random=True, **kwargs)[source]¶
-
get_num_data
(iterator)[source]¶ Return a list of instances that contain num_train_instances many instances of class class_label and all other instances that occur up to this point
-
request_data_for_training
(use_test_data)[source]¶ Returns data for training of subsequent nodes
Note
This method works differently in InstanceSelectionNode than in other nodes: Only percentage_selected of the available data are returned.
-
input_types
= ['PredictionVector', 'FeatureVector', 'TimeSeries']¶
ReduceOverrepresentedClassNode
¶
-
class
pySPACE.missions.nodes.data_selection.instance_selection.
ReduceOverrepresentedClassNode
(**kwargs)[source]¶ Bases:
pySPACE.missions.nodes.base_node.BaseNode
Reject instances to balance categories for classification
The node forwards only a reduced number of the training and test instances of the bigger class to get a balanced ratio of the classes. The forwarded instances are selected randomly. All data of the underrepresented class is forwarded.
Parameters
Exemplary call
- node : Reduce_Overrepresented_Class
Author: Hendrik Woehrle (hendrik.woehrle@dfki.de)
Created: 2010/09/22
POSSIBLE NODE NAMES: - ReduceOverrepresentedClass
- Reduce_Overrepresented_Class
- ReduceOverrepresentedClassNode
POSSIBLE INPUT TYPES: - PredictionVector
- FeatureVector
- TimeSeries
Class Components Summary
_execute
(time_series)balance_instances
(all_instances)Method that performs the rejections of the data in the oversized class input_types
request_data_for_testing
()Returns data for testing of subsequent nodes request_data_for_training
(use_test_data)Returns data for training of subsequent nodes -
balance_instances
(all_instances)[source]¶ Method that performs the rejections of the data in the oversized class
-
input_types
= ['PredictionVector', 'FeatureVector', 'TimeSeries']¶