average_and_feature_vis¶
Module: missions.nodes.visualization.average_and_feature_vis
¶
Visualize average of time series
and time domain features
Additional features can be added to the visualization.
Inheritance diagram for pySPACE.missions.nodes.visualization.average_and_feature_vis
:
AverageFeatureVisNode
¶
-
class
pySPACE.missions.nodes.visualization.average_and_feature_vis.
AverageFeatureVisNode
(load_feature_path='None', axflip=False, alternative_scaling=False, physiological_arrangement=False, shrink_plots=False, important_feature_thresh=0.0, percentage_of_features=100, emotiv=False, error_type='2StdError', **kwargs)[source]¶ Bases:
pySPACE.missions.nodes.base_node.BaseNode
Visualize time domain features in the context of average time series.
This node is supposed to visualize features from any feature selection algorithm in the context of the train. This data is some kind of time series, either channelwise “plain” EEG time series or somehow preprocessed data, e.g. the time series of CSP pseudo channels.
The purpose is to investigate two main issues:
- By comparing the mean time series of standard and target time windows, is it understandable why certain features have been selected?
- Comparing the time series from one set to the selected features from some other set, are the main features robust?
If no features are passed to this node, it will still visualize average time series in any case. Only the time series that are labeled as training data will be taken into account. The reason is that the primary aim of this node is to visualize the features on the very data they were chosen from, i.e., the training data. If instead all data is to be plotted (e.g., at the end of a preprocessing flow), one would in the worst case have to run the node chain twice. In the extra run for the visualization, an All_Train_Splitter would be used prior to this node.
This is what this node will plot:
- In a position in the node chain where the current data object is a time series, it will plot the average of all training samples if the current time series.
- If the current object is not a time series, this node will go back in the data object’s history until it finds a time series. This time series will then be used.
- If a path to a features.pickle is passed using the load_feature_path variable, then this features will be used for plotting.
- If no load_feature_path is set, this node will check if the current data object has a data.predictor.features entry. This will be the case if the previous node has been a classifier. If so, these features will be used.
- If features are found in neither of the aforementioned two locations, no features will be plotted. The average time series however will still be plotted.
Parameters
load_feature_path: Path to the stored pickle file containing the selected features. So far, LibSVM and 1-Norm SVM nodes can deliver this output. Defaults to ‘None’, which implies that no features are plotted. The average time series are plotted anyway.
(optional, default: None)
error_type: Selects which type of error is into the average time series plots:
None: No errors ‘SampleStdDev’: +- 1 Sample standard deviation. This is, under Gaussian assumptions, the area, in which 68% of the samples lie. ‘StdError’: +- 1 Standard error of the mean. This is the area in which, under Gaussian assumptions, the sample mean will end up in 68% of all cases. If multiples of this quantities are desired, simply use them as prefix in the strings. With Multiplier 2, the above percentages change to 95% With Multiplier 3, the above percentages change to 99.7%
Here are examples for valid entries: ‘2SampleStdDev’, None, ‘StdError’, ‘2StdError’, ‘1.7StdError’
(optional, default: ‘2StdError’)
axflip: If axflip is True, the y-axes of the averaged time series plots are reversed. This makes the plots look the way to which psychologists (and even some neuro scientists) are used.
(optional, default: False)
alternative_scaling: If False, the values from the loaded feature file (i.e. the “w” in the SVM notation) are directly used for both graphical feature representation and rating of “feature importance”. If True, instead the product of these values and the difference of the averaged time domain feature values of both classes is used: importance(i) = w(i) * (avg_target(i) - avg_standard(i)) On the one hand, using the feature averages implicitly assumes normally distributed features. On the other hand, this computation takes into account the fact that different features have different value ranges. The eventual classification with SVMs is done by evaluating the sum_i{ w(i) * feature(i) }. In that sense, the here defined importance measures the average contribution of a certain feature to the classification function. As such, and that’s the essential point, it makes the values comparable.
(optional, default: False)
physiological_arrangement: If False all time series plots are arranged in a matrix of plots. If set to True, the plots are arranged according to the arrangement of the electrodes on the scalp. Obviously, this only makes sense if the investigated time series are not spatially filtered. CSP pseudo channels, e.g., can’t be arranged on the scalp.
(optional, default: False)
shrink_plots: Defaults to False and is supposed to be set to True, whenever channels from the 64 electrode cap are investigated jointly with electrodes from 128 cap that do not appear on the 64 cap. Omits overlapping of the plots in physiological arrangement.
(optional, default: False)
important_feature_thresh: Gives a threshold below which features are not considered important. Only important features will appear in the plots. Defaults to 0, i.e. all non-zero features are important. This parameter collides with percentage_of_features; the stricter restriction applies.
(optional, default: 0.0)
percentage_of_features: Define the percentage of features to be drawn in the plots. Defaults to 100, i.e. all features are to be used. This parameter collides with important_feature_thresh; the stricter restriction applies. Thus, even in the default case, most of the time less than 100% of the features will be drawn due to the non-zero condition of the important_feature_thresh parameter. Note that the given percentage is in relation to the total number of features; not in relation to the number of features a classifier has used in some sense.
(optional, default: 100)
emotiv: Use the emotiv parameter if the data was acquired wit the emotiv EPOC system. This will just change the position of text in the plots - it’s not visible otherwise.
(optional, default: False)
Known Issues
The title of physiologically arranged time series plots vanishes, if no frontal channels are plotted, because the the plot gets trimmed and so gets the title.
Exemplary Call
- node : AverageFeatureVis parameters : load_feature_path : "/path/to/my/features.pickle" alternative_scaling : True physiological_arrangement : True axflip : True shrink_plots : False important_feature_thresh : 0.3 percentage_of_features : 20 error_type : "2SampleStdDev"
Author: David Feess (David.Feess@dfki.de)
Created: 2010/02/10
Reviewed: 2011/06/24
POSSIBLE NODE NAMES: - AverageFeatureVisNode
- AverageFeatureVis
POSSIBLE INPUT TYPES: - TimeSeries
- PredictionVector
Class Components Summary
_adjust_spines
(ax, spines, i_chan)Essentially, removes most of the axes in the feature development plots. _execute
(data)Nothing to be done here _format_subplots
(type, i_chan, ...[, ec])Some time series plot formatting. _generate_feature_development_plots
(...)This function generates the actual histogram plot _generate_histo_plot
()This function generates the actual histogram plot _generate_labeled_correlation_matrix
(label)Concatenates the feature names to the actual correlation matrices. _generate_time_series_plot
()This function generates the actual time series plot _get_corr_plot
(corr_matrix, label)Plot the current correlation matrix as filled contour plot and return figure instance. _stop_training
([debug])Finish the training, i.e. _train
(data, label)Add the given data point along with its class label to the training set, i.e. get_last_timeseries_from_history
(data)input_types
is_supervised
()Returns whether this node requires supervised training is_trainable
()Returns whether this node is trainable. store_state
(result_dir[, index])Stores all generated plots in the given directory result_dir -
input_types
= ['TimeSeries', 'PredictionVector']¶
-
__init__
(load_feature_path='None', axflip=False, alternative_scaling=False, physiological_arrangement=False, shrink_plots=False, important_feature_thresh=0.0, percentage_of_features=100, emotiv=False, error_type='2StdError', **kwargs)[source]¶
-
_train
(data, label)[source]¶ Add the given data point along with its class label to the training set, i.e. update ‘mean’ time series and append to the complete data.
-
_stop_training
(debug=False)[source]¶ Finish the training, i.e. for the time series plots: take the accumulated time series and divide by the number of samples per condition. For the
-
store_state
(result_dir, index=None)[source]¶ Stores all generated plots in the given directory result_dir
-
_generate_labeled_correlation_matrix
(label)[source]¶ Concatenates the feature names to the actual correlation matrices. This is for better overview in stored txt files later on.
-
_format_subplots
(type, i_chan, samples_per_condition_string, ec=None)[source]¶ Some time series plot formatting. Mainly writes the channel names into the axes, sets titles and rearranges the axes for physiological_arrengement. Also flips axes if desired by setting axflip = True
-
_generate_feature_development_plots
(important_features)[source]¶ This function generates the actual histogram plot
convert_feature_vector_to_time_series()
¶
-
pySPACE.missions.nodes.visualization.average_and_feature_vis.
convert_feature_vector_to_time_series
(feature_vector, sample_data)[source]¶ Parse the feature name and reconstruct a time series object holding the equivalent data
In a feature vector object, a feature is determined by the feature name and the feature value. When dealing with time domain features, the feature name is a concatenation of the (pseudo-) channel name and the time within an epoch in seconds. A typical feature name reads, e.g., “TD_F7_0.960sec”.