average_and_feature_vis

Module: missions.nodes.visualization.average_and_feature_vis

Visualize average of time series and time domain features

Additional features can be added to the visualization.

Inheritance diagram for pySPACE.missions.nodes.visualization.average_and_feature_vis:

Inheritance diagram of pySPACE.missions.nodes.visualization.average_and_feature_vis

AverageFeatureVisNode

class pySPACE.missions.nodes.visualization.average_and_feature_vis.AverageFeatureVisNode(load_feature_path='None', axflip=False, alternative_scaling=False, physiological_arrangement=False, shrink_plots=False, important_feature_thresh=0.0, percentage_of_features=100, emotiv=False, error_type='2StdError', **kwargs)[source]

Bases: pySPACE.missions.nodes.base_node.BaseNode

Visualize time domain features in the context of average time series.

This node is supposed to visualize features from any feature selection algorithm in the context of the train. This data is some kind of time series, either channelwise “plain” EEG time series or somehow preprocessed data, e.g. the time series of CSP pseudo channels.

The purpose is to investigate two main issues:

  1. By comparing the mean time series of standard and target time windows, is it understandable why certain features have been selected?
  2. Comparing the time series from one set to the selected features from some other set, are the main features robust?

If no features are passed to this node, it will still visualize average time series in any case. Only the time series that are labeled as training data will be taken into account. The reason is that the primary aim of this node is to visualize the features on the very data they were chosen from, i.e., the training data. If instead all data is to be plotted (e.g., at the end of a preprocessing flow), one would in the worst case have to run the node chain twice. In the extra run for the visualization, an All_Train_Splitter would be used prior to this node.

This is what this node will plot:

  • In a position in the node chain where the current data object is a time series, it will plot the average of all training samples if the current time series.
  • If the current object is not a time series, this node will go back in the data object’s history until it finds a time series. This time series will then be used.
  • If a path to a features.pickle is passed using the load_feature_path variable, then this features will be used for plotting.
  • If no load_feature_path is set, this node will check if the current data object has a data.predictor.features entry. This will be the case if the previous node has been a classifier. If so, these features will be used.
  • If features are found in neither of the aforementioned two locations, no features will be plotted. The average time series however will still be plotted.

Parameters

load_feature_path:
 

Path to the stored pickle file containing the selected features. So far, LibSVM and 1-Norm SVM nodes can deliver this output. Defaults to ‘None’, which implies that no features are plotted. The average time series are plotted anyway.

(optional, default: None)

error_type:

Selects which type of error is into the average time series plots:

None:No errors
‘SampleStdDev’:+- 1 Sample standard deviation. This is, under Gaussian assumptions, the area, in which 68% of the samples lie.
‘StdError’:+- 1 Standard error of the mean. This is the area in which, under Gaussian assumptions, the sample mean will end up in 68% of all cases.

If multiples of this quantities are desired, simply use them as prefix in the strings. With Multiplier 2, the above percentages change to 95% With Multiplier 3, the above percentages change to 99.7%

Here are examples for valid entries: ‘2SampleStdDev’, None, ‘StdError’, ‘2StdError’, ‘1.7StdError’

(optional, default: ‘2StdError’)

axflip:

If axflip is True, the y-axes of the averaged time series plots are reversed. This makes the plots look the way to which psychologists (and even some neuro scientists) are used.

(optional, default: False)

alternative_scaling:
 

If False, the values from the loaded feature file (i.e. the “w” in the SVM notation) are directly used for both graphical feature representation and rating of “feature importance”. If True, instead the product of these values and the difference of the averaged time domain feature values of both classes is used: importance(i) = w(i) * (avg_target(i) - avg_standard(i)) On the one hand, using the feature averages implicitly assumes normally distributed features. On the other hand, this computation takes into account the fact that different features have different value ranges. The eventual classification with SVMs is done by evaluating the sum_i{ w(i) * feature(i) }. In that sense, the here defined importance measures the average contribution of a certain feature to the classification function. As such, and that’s the essential point, it makes the values comparable.

(optional, default: False)

physiological_arrangement:
 

If False all time series plots are arranged in a matrix of plots. If set to True, the plots are arranged according to the arrangement of the electrodes on the scalp. Obviously, this only makes sense if the investigated time series are not spatially filtered. CSP pseudo channels, e.g., can’t be arranged on the scalp.

(optional, default: False)

shrink_plots:

Defaults to False and is supposed to be set to True, whenever channels from the 64 electrode cap are investigated jointly with electrodes from 128 cap that do not appear on the 64 cap. Omits overlapping of the plots in physiological arrangement.

(optional, default: False)

important_feature_thresh:
 

Gives a threshold below which features are not considered important. Only important features will appear in the plots. Defaults to 0, i.e. all non-zero features are important. This parameter collides with percentage_of_features; the stricter restriction applies.

(optional, default: 0.0)

percentage_of_features:
 

Define the percentage of features to be drawn in the plots. Defaults to 100, i.e. all features are to be used. This parameter collides with important_feature_thresh; the stricter restriction applies. Thus, even in the default case, most of the time less than 100% of the features will be drawn due to the non-zero condition of the important_feature_thresh parameter. Note that the given percentage is in relation to the total number of features; not in relation to the number of features a classifier has used in some sense.

(optional, default: 100)

emotiv:

Use the emotiv parameter if the data was acquired wit the emotiv EPOC system. This will just change the position of text in the plots - it’s not visible otherwise.

(optional, default: False)

Known Issues

The title of physiologically arranged time series plots vanishes, if no frontal channels are plotted, because the the plot gets trimmed and so gets the title.

Exemplary Call

- 
    node : AverageFeatureVis
    parameters : 
        load_feature_path : "/path/to/my/features.pickle"
        alternative_scaling : True
        physiological_arrangement : True
        axflip : True
        shrink_plots : False
        important_feature_thresh : 0.3
        percentage_of_features : 20
        error_type : "2SampleStdDev"
Author:

David Feess (David.Feess@dfki.de)

Created:

2010/02/10

Reviewed:

2011/06/24

POSSIBLE NODE NAMES:
 
  • AverageFeatureVisNode
  • AverageFeatureVis
POSSIBLE INPUT TYPES:
 
  • TimeSeries
  • PredictionVector

Class Components Summary

_adjust_spines(ax, spines, i_chan) Essentially, removes most of the axes in the feature development plots.
_execute(data) Nothing to be done here
_format_subplots(type, i_chan, ...[, ec]) Some time series plot formatting.
_generate_feature_development_plots(...) This function generates the actual histogram plot
_generate_histo_plot() This function generates the actual histogram plot
_generate_labeled_correlation_matrix(label) Concatenates the feature names to the actual correlation matrices.
_generate_time_series_plot() This function generates the actual time series plot
_get_corr_plot(corr_matrix, label) Plot the current correlation matrix as filled contour plot and return figure instance.
_stop_training([debug]) Finish the training, i.e.
_train(data, label) Add the given data point along with its class label to the training set, i.e.
get_last_timeseries_from_history(data)
input_types
is_supervised() Returns whether this node requires supervised training
is_trainable() Returns whether this node is trainable.
store_state(result_dir[, index]) Stores all generated plots in the given directory result_dir
input_types = ['TimeSeries', 'PredictionVector']
__init__(load_feature_path='None', axflip=False, alternative_scaling=False, physiological_arrangement=False, shrink_plots=False, important_feature_thresh=0.0, percentage_of_features=100, emotiv=False, error_type='2StdError', **kwargs)[source]
is_trainable()[source]

Returns whether this node is trainable.

is_supervised()[source]

Returns whether this node requires supervised training

_execute(data)[source]

Nothing to be done here

get_last_timeseries_from_history(data)[source]
_train(data, label)[source]

Add the given data point along with its class label to the training set, i.e. update ‘mean’ time series and append to the complete data.

_stop_training(debug=False)[source]

Finish the training, i.e. for the time series plots: take the accumulated time series and divide by the number of samples per condition. For the

store_state(result_dir, index=None)[source]

Stores all generated plots in the given directory result_dir

_generate_labeled_correlation_matrix(label)[source]

Concatenates the feature names to the actual correlation matrices. This is for better overview in stored txt files later on.

_generate_time_series_plot()[source]

This function generates the actual time series plot

_format_subplots(type, i_chan, samples_per_condition_string, ec=None)[source]

Some time series plot formatting. Mainly writes the channel names into the axes, sets titles and rearranges the axes for physiological_arrengement. Also flips axes if desired by setting axflip = True

_generate_histo_plot()[source]

This function generates the actual histogram plot

_generate_feature_development_plots(important_features)[source]

This function generates the actual histogram plot

_get_corr_plot(corr_matrix, label)[source]

Plot the current correlation matrix as filled contour plot and return figure instance.

_adjust_spines(ax, spines, i_chan)[source]

Essentially, removes most of the axes in the feature development plots. Also produces the alternating shift of the left axes.

convert_feature_vector_to_time_series()

pySPACE.missions.nodes.visualization.average_and_feature_vis.convert_feature_vector_to_time_series(feature_vector, sample_data)[source]

Parse the feature name and reconstruct a time series object holding the equivalent data

In a feature vector object, a feature is determined by the feature name and the feature value. When dealing with time domain features, the feature name is a concatenation of the (pseudo-) channel name and the time within an epoch in seconds. A typical feature name reads, e.g., “TD_F7_0.960sec”.