parameter_optimization

Module: missions.nodes.meta.parameter_optimization

Determine the optimal parameterization of a subflow

Inheritance diagram for pySPACE.missions.nodes.meta.parameter_optimization:

Inheritance diagram of pySPACE.missions.nodes.meta.parameter_optimization

Class Summary

ParameterOptimizationBase(flow_template[, ...]) Base class for parameter optimization nodes
GridSearchNode([ranges, grid]) Grid search for optimizing the parameterization of a subflow
PatternSearchNode([start, directions, ...]) Extension of the standard Pattern Search algorithm

Classes

ParameterOptimizationBase

class pySPACE.missions.nodes.meta.parameter_optimization.ParameterOptimizationBase(flow_template, variables=[], metric='Balanced_accuracy', std_weight=0, inverse_metric=False, runs=1, nominal_ranges=None, debug=False, validation_parameter_settings={}, final_training_parameter_settings={}, **kwargs)[source]

Bases: pySPACE.missions.nodes.base_node.BaseNode

Base class for parameter optimization nodes

The overall goal is to determine the parameterization of a subflow that maximizes a given metric.

Nodes derived from this class can determine the optimal parameterization (variables) of a subpart of a flow (nodes) fully autonomously. For instance, for different numbers of features retained during feature selection, different complexities/regularization coefficients might be optimal for a classifier. In order to determine which feature-number is optimal, one must choose classifier’ complexity separately for each feature number.

First of all the training data is split into training and validation data e.g. by cross validation (validation_set). Then for a chosen set of parameters the data is processed as described in the specification (nodes). Then the classifier is evaluated (evaluation) using another node, some combination of metrics or something else as for example a derivative in future implementations. So the nodes specification should always include a classifier which is evaluated.

This procedure is maybe repeated until a good set of parameters is found. The algorithm, which defines the way of choosing parameters should determine the node name (e.g. PatternSearchNode) and parameters specifically for the optimization procedure can be passed via the optimization spec. Also, general functions, especially the parts belonging to the evaluation of a parameter are provided in this base class (e.g., function mapping parameter dictionaries to keys and a function to create a grid from given parameters)

When a good parameter is found, the nodes are trained with this parameter on the whole data set.

Note

In future, alternatives can be added, e.g. to combine the different flows of the cross validation with a simple ensemble classifier.

Note

If you want to optimize parameters for each sub-split, this should not be done in this node.

Parameters It is important to mention, that the definition of parameters of this node is structured into main parameters and sub parameters to describe the different aspects of parameter optimization. So take care of indentations.

optimization:

As mentioned above this parameter dictionary is used by the specific subclasses, i.e., by the specific optimization algorithms. Hence, see subclasses for documentation of possible parameters.

parallelization:
 

This parameter dictionary is used for parallelization of subflow execution. Possible parameters so far are processing_modality, pool_size and batch_size. See SubflowHandler for more information.

validation_set:
splits:

The number of splits used in an internal cross-validation loop. Note that more splits lead to better estimates of the parametrization’s performance but do also increase computation time considerably.

(recommended, default: 5)

split_node:

If no standard CV-splitter with “cv_splits” splits is used, an alternative node is specified by this node in YAML node syntax.

(optional, default: CV_Splitter)

runs:

Number of internal runs used for evaluation. Nodes as the CV_Splitter behave different with every run. So we repeat the calculation runs times and have a different randomizer each time. The random seed used in each repetition is:

10 * external_run_number + internal_run_number

The resulting performance measure is calculated with the performances of different (internal) runs and splits. (average - w * standard deviation)

(optional, default: 1)

randomize:

Changing cv-splitter with every parameter evaluation step.

Note

Not yet implemented

evaluation:

Specification of the sink node and the corresponding evaluation function

performance_sink_node:
 

Specify a different sink node in YAML node syntax. Otherwise the default ‘Classification_Performance_Sink’ will be used with the following parameter ir_class.

(optional, default: Classification_Performance_Sink)

ir_class:

The class name (as string) for which IR statistics are computed.

(recommended, default: ‘Target’)

metric:

This is the metric that should be maximized. Each metric which is computed by the performance_sink_node can be used, for instance “Balanced_accuracy”, “AUC”, “F_measure” or even soft metrics or loss metrics.

(recommended, default: ‘Balanced_accuracy’)

inverse_metric:

If True, metric values are multiplied by -1. In this way, loss metrics can be used for parameter optimization.

(optional, boolean, default: False)

std_weight:

Cross validation gives several values for the estimated performance. Therefore we use the difference expected value minus std_weight times standard deviation.

(optional, default: 0)

variables:

List of the parameters, to be optimized and replaced in the node_spec.

nodes:

The original specification of the nodes that should be optimized. The value of the “nodes” parameter must be a standard NodeChain definition in YAML syntax (properly indented).

nominal_ranges:

Similar to the ranges in the grid search, a grid can be specified mainly for nominal parameters. All the other parameters are then optimized dependent on these parameters. Afterwards the resulting performance value is compared to choose the best nominal parameter. When storing the results each nominal parameter is stored

(optional, default: None)

debug:

Switch on eventually existing debug messages

(optional, default: False)

validation_parameter_settings:
 

Dictionary, of parameter mappings to be replaced in the nodes in the validation phase of the parameter optimization.

This works together with the final_training_parameter_settings.

(optional, default: dict())

final_training_parameter_settings:
 

Dictionary, of parameter mappings to be replaced in the nodes after the validation phase of the parameter optimization in the final training phase.

This works together with the validation_parameter_settings.

A very important use case of these parameters is to switch of retrain mode in validation phase but nevertheless have it active in the final subflow or node.

(optional, default: dict())

Author:Mario Krell (mario.krell@dfki.de)
Created:2011/08/03
LastChange:2012/09/03 Anett Seeland - new structure for parallelization

Class Components Summary

_execute(data) Execute the flow on the given data vector data
_get_flow() Method introduced for consistency with flow_node
_inc_train(data[, class_label]) Iterate through the nodes to train them
_stop_training() Do the optimization step and define final parameter choice
_train(data, label) Train the flow on the given data vector data
check_parameters(param_spec) Check input parameters of existence and appropriateness
get_best_dict_entry(performance_dict) Find the highest performance value in the dictionary
get_best_parametrization() Apply optimization algorithm
get_output_type(input_type[, as_string]) Returns the output type of the entire flow
get_previous_transformations([sample]) Recursively construct a list of (linear) transformations
get_sensor_ranking() Get the sensor ranking from the optimized trained flow
is_retrainable() Retraining if one node in subflow is retrainable
is_supervised() Return whether this node requires supervised training
is_trainable() Return whether this node is trainable
p2key(parameters) Map parameter dictionary to hashable tuple (key for dictionary)
prepare_optimization() Initialize optimization procedure
present_label(label) Forward the label to the subflow
re_init() Reset optimization params
search_grid(parameter_ranges) Combine each parameter in parameter ranges to a grid via cross product
store_state(result_dir[, index]) Store this node in the given directory result_dir
__init__(flow_template, variables=[], metric='Balanced_accuracy', std_weight=0, inverse_metric=False, runs=1, nominal_ranges=None, debug=False, validation_parameter_settings={}, final_training_parameter_settings={}, **kwargs)[source]
is_trainable()[source]

Return whether this node is trainable

is_supervised()[source]

Return whether this node requires supervised training

static check_parameters(param_spec)[source]

Check input parameters of existence and appropriateness

_train(data, label)[source]

Train the flow on the given data vector data

_stop_training()[source]

Do the optimization step and define final parameter choice

This is the main method of this node!

_execute(data)[source]

Execute the flow on the given data vector data

This method is used in offline mode and for delivering the training data for the next node. In the other case, request_data_for_testing is used.

get_output_type(input_type, as_string=True)[source]

Returns the output type of the entire flow

_get_flow()[source]

Method introduced for consistency with flow_node

This node itself is no real flow_node, since the final flow is unknown during initialization, but specified during the optimization process.

_inc_train(data, class_label=None)[source]

Iterate through the nodes to train them

The optimal parameter remains fixed and then the nodes in the optimal flow get the incremental training.

Here it is important to know, that first the node is changed and then the changed data is forwarded to the next node. This is different to the normal offline retraining scheme.

is_retrainable()[source]

Retraining if one node in subflow is retrainable

present_label(label)[source]

Forward the label to the subflow

buffering must be set to True only for the main node for incremental learning in application (live environment). The inner nodes must not have set this parameter.

store_state(result_dir, index=None)[source]

Store this node in the given directory result_dir

get_sensor_ranking()[source]

Get the sensor ranking from the optimized trained flow

get_previous_transformations(sample=None)[source]

Recursively construct a list of (linear) transformations

Overwrites BaseNode function, since meta node needs to get transformations from subflow.

re_init()[source]

Reset optimization params

Subclasses can overwrite this method if necessary, e.g. in case some parameters have to be reinitialized if several optimizations are done

prepare_optimization()[source]

Initialize optimization procedure

Subclasses can overwrite this method if necessary.

get_best_parametrization()[source]

Apply optimization algorithm

This method has to be implemented in the subclass.

get_best_dict_entry(performance_dict)[source]

Find the highest performance value in the dictionary

static search_grid(parameter_ranges)[source]

Combine each parameter in parameter ranges to a grid via cross product

static p2key(parameters)[source]

Map parameter dictionary to hashable tuple (key for dictionary)

GridSearchNode

class pySPACE.missions.nodes.meta.parameter_optimization.GridSearchNode(ranges=None, grid=None, *args, **kwargs)[source]

Bases: pySPACE.missions.nodes.meta.parameter_optimization.ParameterOptimizationBase, pySPACE.environments.chains.node_chain.SubflowHandler

Grid search for optimizing the parameterization of a subflow

For each parameter a list of parameters is specified (ranges). The crossproduct of all values in ranges is computed and a subflow is evaluated for each of this parameterizations using cross-validation on the training data and finally the best point in the search grid is chosen as optimal point.

Parameters This algorithms does not need the variables parameter, since it is also included in the ranges parameter.

ranges:

A dictionary mapping parameters to the values they should be tested for. If more than one parameter is given, the crossproduct of all parameter values is computed (i.e. each combination). For each resulting parameter combination, the flow specified in the YAML syntax is evaluated. The parameter names should be used somewhere in this YAML definition and should be unique since the instantiation is based on pure textual replacement. It is common to enforce this by starting and ending the parameter names by “~~”. In the example below, the two parameters are called “~~OUTLIERS~~” and “~~COMPLEXITY~~”, each having 3 values. This results in 9 parameter combinations to be tested.

grid:

From the ranges parameter a grid is generated. As a replacement, the grid can be specified directly using this parameter. Therefore, a list of dictionaries is used.

For example a ranges parametrization like ‘{~~OUTLIERS~~ : [0, 5, 10], ~~COMPLEXITY~~ : [0.01, 0.1, 1.0]}’ could be transferred to:

- ~~OUTLIERS~~ : 0
  ~~COMPLEXITY~~ : 0.01
- ~~OUTLIERS~~ : 0
  ~~COMPLEXITY~~ : 0.1
- ~~OUTLIERS~~ : 0
  ~~COMPLEXITY~~ : 1
- ~~OUTLIERS~~ : 5
  ~~COMPLEXITY~~ : 0.01
- ~~OUTLIERS~~ : 5
  ~~COMPLEXITY~~ : 0.1
- ~~OUTLIERS~~ : 5
  ~~COMPLEXITY~~ : 1
- ~~OUTLIERS~~ : 10
  ~~COMPLEXITY~~ : 0.01
- ~~OUTLIERS~~ : 10
  ~~COMPLEXITY~~ : 0.1
- ~~OUTLIERS~~ : 10
  ~~COMPLEXITY~~ : 1

Exemplary Call

-
    node : Grid_Search
    parameters :
        optimization:
            ranges : {~~OUTLIERS~~ : [0, 5, 10],
                      ~~COMPLEXITY~~ : [0.01, 0.1, 1.0]}
        parallelization:
            processing_modality : 'backend'
            pool_size : 2
        validation_set :
            split_node :
                node : CV_Splitter
                parameters :
                    splits : 10
                    stratified : True
                    time_dependent : True
        evaluation:
            metric : "Balanced_accuracy"
            std_weight: 1
            performance_sink_node :
                node : Sliding_Window_Performance_Sink
                parameters :
                    ir_class : "Movement"
                    classes_names : ['NoMovement','Movement']
                    uncertain_area : 'eval([(-600,-350)])'
                    calc_soft_metrics : True
                    save_score_plot : True 
        variables: [~~OUTLIERS~~, ~~COMPLEXITY~~]
        nodes :
            -
                node : Feature_Normalization
                parameters :
                    outlier_percentage : ~~OUTLIERS~~
            -  
                node: LibSVM_Classifier
                parameters :
                    complexity : ~~COMPLEXITY~~
                    class_labels : ['NoMovement', 'Movement']
                    weight : [1.0, 2.0]
                    kernel_type : 'LINEAR'
POSSIBLE NODE NAMES:
 
  • GridSearchNode
  • Grid_Search
  • GridSearch
POSSIBLE INPUT TYPES:
 
  • PredictionVector
  • FeatureVector
  • TimeSeries

Class Components Summary

get_best_parametrization() Evaluate each flow-parameterization by running a cross validation
input_types
node_from_yaml(node_spec) Create the node based on the node_spec
__init__(ranges=None, grid=None, *args, **kwargs)[source]
static node_from_yaml(node_spec)[source]

Create the node based on the node_spec

get_best_parametrization()[source]

Evaluate each flow-parameterization by running a cross validation on the training data for grid search

input_types = ['PredictionVector', 'FeatureVector', 'TimeSeries']

PatternSearchNode

class pySPACE.missions.nodes.meta.parameter_optimization.PatternSearchNode(start=[], directions=[], start_step_size=1.0, stop_step_size=1e-10, scaling_factor=0.5, up_scaling_factor=1, max_iter=inf, max_bound=[], min_bound=[], red_pars=[], **kwargs)[source]

Bases: pySPACE.missions.nodes.meta.parameter_optimization.ParameterOptimizationBase, pySPACE.environments.chains.node_chain.SubflowHandler

Extension of the standard Pattern Search algorithm

For main principle see: Numerical Optimization, Jorge Nocedal & Stephen J. Wright

Special Components

  • No double calculation of already visited points
  • Step size increased, when better point is found, to speed up search
  • Possible limit on iteration steps to be comparable to grid search
  • cross validation cycle inside

Parameters

The following parameters have to be specified in the optimization spec. For the Algorithm the thereof variables parameter is important since it gives the order of parameters to simplify the specification of vectors, corresponding to point, directions or bounds. They can be specified as dictionaries, lists or tuple and they are transformed internally to array with the method get_vector. The transformation back to keys for filling them in in the node chains is done by v2d.

start:

Starting point of the algorithm. For SVM optimization, the complexity has to be sufficiently small.

(recommended, default: ones(dimension))

directions:

List of directions, being evaluated around current best point.

(optional, default: unit directions)

start_step_size:
 

First value to scale the direction vectors

(optional, default: 1.0)

stop_step_size:

Minimal value to scale the direction vectors

If the step size gets lower, the algorithm stops.

(optional, default: 1e-10)

scaling_factor:

When evaluations does not deliver a better point, the current scaling of the directions is reduced by the scaling factor. Otherwise it is increased by the up_scaling_factor.

(optional, default: 0.5)

up_scaling_factor:
 

If the evaluation gives a better point, the step size is increased by this factor. In the default, there is no up scaling.

(optional, default: 1)

max_iter:

If the total number of evaluation of the directions exceeds this value, the algorithm also stops.

(optional, default: infinity)

max_bound:

Points exceeding this bounds are not evaluated.

(optional, default: inf*array(ones(dimension)))

min_bound:

Undershooting points are not evaluated.

(optional, default: -inf*array(ones(dimension)))

Exemplary Call

-
    node : Pattern_Search
    parameters : 
        parallelization :
            processing_modality : 'local'
            pool_size : 4
        optimization:
            start_step_size : 0.002
            start : [0.005,0.01]
            directions : [[-1,-1],[1,1],[1,-1],[-1,1]]
            stop_step_size : 0.00001
            scaling_factor : 0.25
            min_bound : [0,0]
            max_bound : [10,10]
            max_iter : 100
        validation_set :
            split_node :
                node : CV_Splitter
                parameters :
                    splits : 5
            runs : 2
        evaluation:
            metric : "Balanced_accuracy"
            std_weight: 1
            ir_class : "Target" 
        variables: [~~W1~~, ~~W2~~]
        nodes :
            -
                node: LibSVM_Classifier
                parameters :
                    complexity : 1
                    class_labels : ['Standard', 'Target']
                    weight : [~~W1~~, ~~W2~~]
                    kernel_type : 'LINEAR'
POSSIBLE NODE NAMES:
 
  • PatternSearch
  • PatternSearchNode
  • Pattern_Search
POSSIBLE INPUT TYPES:
 
  • PredictionVector
  • FeatureVector
  • TimeSeries

Class Components Summary

get_best_parametrization() Perform pattern search
get_vector(v_spec) Transform list, tuple or dict to an array/vector
input_types
node_from_yaml(node_spec) Create the node based on the node_spec
prepare_optimization() Calculate initial performance value
ps_step() Single descent step of the pattern search algorithm
re_init() Reset search for optimum
v2d(v) Transform vector to dictionary using self.variables
__init__(start=[], directions=[], start_step_size=1.0, stop_step_size=1e-10, scaling_factor=0.5, up_scaling_factor=1, max_iter=inf, max_bound=[], min_bound=[], red_pars=[], **kwargs)[source]
static node_from_yaml(node_spec)[source]

Create the node based on the node_spec

re_init()[source]

Reset search for optimum

prepare_optimization()[source]

Calculate initial performance value

get_vector(v_spec)[source]

Transform list, tuple or dict to an array/vector

v2d(v)[source]

Transform vector to dictionary using self.variables

get_best_parametrization()[source]

Perform pattern search

Evaluate set of directions around current best solution. If better solution is found start new evaluation of directions. Otherwise, reduce length of directions by scaling factor

input_types = ['PredictionVector', 'FeatureVector', 'TimeSeries']
ps_step()[source]

Single descent step of the pattern search algorithm