instance_selection¶
Module: missions.nodes.data_selection.instance_selection¶
Select only a part of the instances
Inheritance diagram for pySPACE.missions.nodes.data_selection.instance_selection:
Class Summary¶
InstanceSelectionNode([...]) |
Retain only a certain percentage of the instances |
ReduceOverrepresentedClassNode(\*\*kwargs) |
Reject instances to balance categories for classification |
Classes¶
InstanceSelectionNode¶
-
class
pySPACE.missions.nodes.data_selection.instance_selection.InstanceSelectionNode(train_percentage_selected=100, test_percentage_selected=100, reduce_class=False, num_train_instances=None, class_label='Target', random=True, **kwargs)[source]¶ Bases:
pySPACE.missions.nodes.base_node.BaseNodeRetain only a certain percentage of the instances
The node InstanceSelectionNode forwards only train_percentage_selected percent of the training instances passed to him to the successor node and only test_percentage_selected percent of the test instances. The forwarded instances are selected randomly but so that the class ratio is kept.
If reduce_class is used, only the chosen class is reduced, without keeping the class ratio. So the total mount of reduced data does not match the percentage values.
- Parameters
train_percentage_selected: The percentage of training instances which is forwarded to successor node.
(optional, default: 100)
test_percentage_selected: The percentage of test instances which is forwarded to successor node.
(optional, default: 100)
reduce_class: If you want only to reduce one class, choose this parameter otherwise, both classes are reduced in a balanced fashion.
(optional, default: False)
num_train_instances: Instead of specifying train_percentage_selected, this option allows to specify the absolute number of training instances of class class_label that should be in the training set. All instances that occur until num_train_instances are found are used for training.
(optional, default: None)
class_label: If num_train_instances-option is used, this string determines the class of which training examples are count.
(optional, default: ‘Target’)
random: If False, the order of the data is retained. I.e. the first X percent or number of train instances are used for training. If True, the training data is sampled randomly without taking into consideration the data’s order.
(optional, default: True)
Exemplary call
- node : InstanceSelection parameters : train_percentage_selected : 80 test_percentage_selected : 100 reduce_class : Standard
Author: Jan Hendrik Metzen (jhm@informatik.uni-bremen.de)
Created: 2010/03/31
POSSIBLE NODE NAMES: - InstanceSelection
- InstanceSelectionNode
- RandomInstanceSelection
POSSIBLE INPUT TYPES: - PredictionVector
- FeatureVector
- TimeSeries
Class Components Summary
_execute(time_series)get_num_data(iterator)Return a list of instances that contain num_train_instances many input_typesrequest_data_for_testing()Returns data for testing of subsequent nodes request_data_for_training(use_test_data)Returns data for training of subsequent nodes -
__init__(train_percentage_selected=100, test_percentage_selected=100, reduce_class=False, num_train_instances=None, class_label='Target', random=True, **kwargs)[source]¶
-
get_num_data(iterator)[source]¶ Return a list of instances that contain num_train_instances many instances of class class_label and all other instances that occur up to this point
-
request_data_for_training(use_test_data)[source]¶ Returns data for training of subsequent nodes
Note
This method works differently in InstanceSelectionNode than in other nodes: Only percentage_selected of the available data are returned.
-
input_types= ['PredictionVector', 'FeatureVector', 'TimeSeries']¶
ReduceOverrepresentedClassNode¶
-
class
pySPACE.missions.nodes.data_selection.instance_selection.ReduceOverrepresentedClassNode(**kwargs)[source]¶ Bases:
pySPACE.missions.nodes.base_node.BaseNodeReject instances to balance categories for classification
The node forwards only a reduced number of the training and test instances of the bigger class to get a balanced ratio of the classes. The forwarded instances are selected randomly. All data of the underrepresented class is forwarded.
Parameters
Exemplary call
- node : Reduce_Overrepresented_Class
Author: Hendrik Woehrle (hendrik.woehrle@dfki.de)
Created: 2010/09/22
POSSIBLE NODE NAMES: - ReduceOverrepresentedClass
- Reduce_Overrepresented_Class
- ReduceOverrepresentedClassNode
POSSIBLE INPUT TYPES: - PredictionVector
- FeatureVector
- TimeSeries
Class Components Summary
_execute(time_series)balance_instances(all_instances)Method that performs the rejections of the data in the oversized class input_typesrequest_data_for_testing()Returns data for testing of subsequent nodes request_data_for_training(use_test_data)Returns data for training of subsequent nodes -
balance_instances(all_instances)[source]¶ Method that performs the rejections of the data in the oversized class
-
input_types= ['PredictionVector', 'FeatureVector', 'TimeSeries']¶