comp_analysis

Module: missions.operations.comp_analysis

Creates various comparing plots on several levels for a PerformanceResultSummary

This module contains implementations of an operation and a process for analyzing data contained in a csv file (typically the result of a Weka Classification Operation).

A CompAnalysisProcess consists of evaluating the effect of several parameter on a set of metrics. For each numeric parameter, each pair of numeric parameters and each nominal parameter, one plot is created for each metric.

Furthermore, for each value of each parameter, the rows of the data where the specific parameter takes on the specific value are selected and the same analysis is done for this subset recursively.

This is useful for large experiments where several parameters are differed. For instance, if one wants to analyze how the performance is for certain settings of certain parameters, on can get all plots in the respective subdirectories. For instance, if one is interested only in the performance of one classifier, on can go into the subdirectory of the respective classifier.

The “Comp Analysis Operation” is similar to the “Analysis Operation”; the only difference is that plots corresponding to the same parameters but different metrics are created in one single pdf file. This ensures better comparability and lower computation time, as less files are created.

Inheritance diagram for pySPACE.missions.operations.comp_analysis:

Inheritance diagram of pySPACE.missions.operations.comp_analysis

Class Summary

CompAnalysisOperation(processes, ...[, ...]) Operation for analyzing and plotting data in compressed format
CompAnalysisProcess(result_dir, data_dict, ...) Process for analyzing and plotting data

Classes

CompAnalysisOperation

class pySPACE.missions.operations.comp_analysis.CompAnalysisOperation(processes, operation_spec, result_directory, number_processes, create_process=None)[source]

Bases: pySPACE.missions.operations.base.Operation

Operation for analyzing and plotting data in compressed format

A CompAnalysisOperation is similar to a AnalysisOperation:

An AnalysisOperation loads the data from a csv-file (typically the result of a Weka Classification Operation) and evaluates the effect of various parameters on several metrics.

Class Components Summary

_createProcesses(processes, result_dir, ...) Recursive function that is used to create the analysis processes
_numberOfProcesses(number_of_processes, ...) Recursive function to determine the number of processes that
consolidate()
create(operation_spec, result_directory[, ...]) A factory method that creates an Analysis operation based on the information given in the operation specification operation_spec.
__init__(processes, operation_spec, result_directory, number_processes, create_process=None)[source]
classmethod create(operation_spec, result_directory, debug=False, input_paths=[])[source]

A factory method that creates an Analysis operation based on the information given in the operation specification operation_spec. If debug is TRUE the creation of the Analysis Processes will not be in a separated thread.

classmethod _numberOfProcesses(number_of_processes, number_of_parameter_values)[source]

Recursive function to determine the number of processes that will be created for the given number_of_parameter_values

classmethod _createProcesses(processes, result_dir, data_dict, parameters, dep_par, metrics, logscale, markertype, top_level)[source]

Recursive function that is used to create the analysis processes

Each process creates one plot for each numeric parameter, each pair of numeric parameters, and each nominal parameter based on the data contained in the data_dict. The results are stored in result_dir. The method calls itself recursively for each value of each parameter.

consolidate()[source]

CompAnalysisProcess

class pySPACE.missions.operations.comp_analysis.CompAnalysisProcess(result_dir, data_dict, parameters, metrics, logscale, markertype)[source]

Bases: pySPACE.missions.operations.base.Process

Process for analyzing and plotting data

A CompAnalysisProcess is quite similar to a AnalysisProcess:

An CompAnalysisProcess consists of evaluating the effect of several parameters on a set of metrics. For each numeric parameter, each pair of numeric parameters and each nominal parameter, one plot is created for all metrics (instead of one plot for each metric as in AnalysisProcess).

Expected parameters
result_dir : The directory in which the actual results are stored data_dict : A dictionary containing all the data. The dictionary contains a mapping from an attribute (e.g. accuracy) to a list of values taken by an attribute. An entry is the entirety of all i-th values over all dict-values parameters : The parameters which have been varied during the experiment and whose effect on the metrics should be investigated. These must be keys of the data_dict. metrics: The metrics the should be evaluated. Must be keys of the data_dict. logscale: Boolean, numeric x-axis will be scaled log’ly if true. markertype: A string like ‘.’ defining the marker type for certain plots. Default is ‘x’.

Class Components Summary

__call__() Executes this process on the respective modality
_plot_nominal(data, result_dir, fig1, ax, ...) Creates a boxplot of the y_keys for the given nominal parameter x_key.
_plot_numeric(data, result_dir, fig1, ax, ...) Creates a plot of the y_keys for the given numeric parameter x_key.
_plot_numeric_vs_nominal(data, result_dir, ...) Plot for comparison of several different values of a nominal parameter
_plot_numeric_vs_numeric(data, result_dir, ...) Contour plot of the value_keys for the two numeric parameters axis_keys.
_scalar_metric(metric, numeric_parameters, ...) Creates the plots for a scalar metric
_sequence_metric(metric, numeric_parameters, ...) Creates the plots for a sequence metric
__init__(result_dir, data_dict, parameters, metrics, logscale, markertype)[source]
__call__()[source]

Executes this process on the respective modality

_scalar_metric(metric, numeric_parameters, nominal_parameters, fig1, ax, im)[source]

Creates the plots for a scalar metric

_sequence_metric(metric, numeric_parameters, nominal_parameters, fig1, ax, mwa_window_length)[source]

Creates the plots for a sequence metric

_plot_numeric(data, result_dir, fig1, ax, x_key, y_key, conditions=[], one_figure=False, show_errors=False)[source]

Creates a plot of the y_keys for the given numeric parameter x_key.

A method that allows to create a plot that visualizes the effect of differing one variable onto a second one (e.g. the effect of differing the number of features onto the accuracy).

Expected parameters

data : A dictionary, that contains a mapping from an attribute (e.g. accuracy) to a list of values taken by an attribute. An entry is the entirety of all i-th values over all dict-values

result_dir : The directory in which the plots will be saved.

x_key : The key of the dictionary whose values should be used as values for the x-axis (the independent variables)

y_key : The key of the dictionary whose values should be used as values for the y-axis, i.e. the dependent variables conditions : A list of functions that need to be fulfilled in order to use one entry in the plot. Each function has to take two arguments: The data dictionary containing all entries and the index of the entry that should be checked. Each condition must return a boolean value.

one_figure
: If true, all curves are plotted in the same figure.
Otherwise, for each value of curve_key, a new figure is generated (currently ignored)

show_errors : If true, error bars are plotted

_plot_numeric_vs_numeric(data, result_dir, fig1, ax, im, axis_keys, value_key)[source]

Contour plot of the value_keys for the two numeric parameters axis_keys.

A method that allows to create a contour plot that visualizes the effect of differing two variables on a third one (e.g. the effect of differing the lower and upper cutoff frequency of a bandpass filter onto the accuracy).

Expected parameters

data : A dictionary that contains a mapping from an attribute (e.g. accuracy) to a list of values taken by an attribute. An entry is the entirety of all i-th values over all dict-values.

result_dir: The directory in which the plots will be saved.

axis_keys: The two keys of the dictionary that are assumed to have an effect on a third variable (the dependent variable)

value_key: The dependent variables whose values determine the color of the contour plot

_plot_numeric_vs_nominal(data, result_dir, fig1, ax, numeric_key, nominal_key, value_key)[source]

Plot for comparison of several different values of a nominal parameter

A method that allows to create a plot that visualizes the effect of varying one numeric parameter onto the performance for several different values of a nominal parameter.

Expected parameters
  • data* : A dictionary that contains a mapping from an attribute (e.g. accuracy) to a list of values taken by an attribute. An entry is the entirety of all i-th values over all dict-values.

result_dir: The directory in which the plots will be saved.

numeric_key: The numeric parameter whose effect (together with the nominal parameter) onto the dependent variable should be investigated.

nominal_key: The nominal parameter whose effect (together with the numeric parameter) onto the dependent variable should
be investigated.

value_key : The dependent variables whose values determine the color of the contour plot

_plot_nominal(data, result_dir, fig1, ax, x_key, y_key)[source]

Creates a boxplot of the y_keys for the given nominal parameter x_key.

A method that allows to create a plot that visualizes the effect of differing one nominal variable onto a second one (e.g. the effect of differing the classifier onto the accuracy).

Expected parameters

data: A dictionary, that contains a mapping from an attribute (e.g. accuracy) to a list of values taken by an attribute. An entry is the entirety of all i-th values over all dict-values

result_dir: The director in which the plots will be saved.

x_key: The key of the dictionary whose values should be used as values for the x-axis (the independent variables)

y_key: The key of the dictionary whose values should be used as values for the y-axis, i.e. the dependent variable