The Specs DirectoryΒΆ

The pySPACE specs directory contains all kinds of specifications for pySPACE. For instance it contains operation_chains and operation specification files and node chain specification files. The structure of the specs directory is as follows:

Specs_Dir
             /node_chains
             /operation_chains
             /operations

The default $Specs_Dir can be found at ~/pySPACEcenter/specs/. If one starts pySPACE with python launch.py --operation_chain example.yaml ..., the operation_chain specification file ‘example.yaml’ is looked up in $Specs_Dir/operation_chains. Similarly, python launch.py --operation example_operation.yaml ... will look for the file example_operation.yaml in $Specs_Dir/operations. The directory node_chains contains specifications of data processing flows in the form of a concatenation of nodes that can be used within an operation of type node_chain. For instance, the node_chain used in the NodeChainOperation found at $Specs_Dir/operations/examples/node_chain.yaml:

# An example of a *node_chain* specification file.
# The specified input is the value of the entry
# with the key "input_path", the templates the value of "templates".
# This template is parametrized with two parameters called
# "__LOWER_CUTOFF__", and "__UPPER_CUTOFF__". Optionally, some "constraints"
# on the allowed parameter combinations can be defined. For instance,
# the constraint "__LOWER_CUTOFF__ < __UPPER_CUTOFF__" prevents that
# the combination where both __LOWER_CUTOFF__ and __UPPER_CUTOFF__ are 2.0
# is tested. For each combination of the given values for these two parameters
# that fulfills all constraints and the datasets of the
# input summary, one result dataset is created. This result dataset
# consists of the results of 10 independent runs with the
# instantiated template performed to the respective input dataset.
# Each such run is an independent process.

# The optional parameter "backend" allows to overwrite the backend specification provided
# via the command-line. This is useful if the operation is part of a chain
# and different operations of the chain should
# not be executed on the same modality.

type: node_chain

input_path: "example_data"
templates : ["example_flow.yaml"]
backend: "local"
parameter_ranges :
    __LOWER_CUTOFF__ : [0.1, 1.0, 2.0]
    __UPPER_CUTOFF__ : [2.0, 4.0]
constraints:
    - "__LOWER_CUTOFF__ < __UPPER_CUTOFF__"


runs : 10

might look as follows:

- 
    node: Time_Series_Source
-
    node : CV_Splitter
    parameters : 
        splits : 10
-
    node : Devariancing
    parameters : 
        devariance_method : "eval(__import__('numpy').std)"
- 
    node : PCA
    parameters : 
        retained_channels : 32
- 
    node : FFT_Band_Pass_Filter
    parameters : 
        pass_band : [${__LOWER_CUTOFF__}, ${__UPPER_CUTOFF__}]
- 
    node : Subsampling
    parameters : 
        target_frequency : 20.0
-
    node : BBCI_Features
    parameters : 
        datapoints : [-4,-3,-2,-1]
- 
    node: Labeled_Feature_Vector_CV_Sink
    parameters :
        classes_names : ["LRP", "NoLRP"]

Operation chain, operation and node chain specifications are described in extra sections.

The directory weka_templates contains templates for command line calls of Weka. These templates are parametrized and the respective instantiation of the template depends on the parameters of the operation. For instance, the template for Weka classification operations look as follows:

/usr/bin/java 1>>%(temp_results)s/../weka.out 2>> %(temp_results)s/../weka.out -Xmx2048M -Djava.awt.headless=true -classpath %(weka_class_path)s \
weka.experiment.Experiment -r \
-L %(run_number)s -U %(run_number)s \
-T %(sample_dataset)s \
-D 'weka.experiment.CSVResultListener -O %(temp_results)s/results_%(unique_id)s.csv' \
-P weka.experiment.ExternalCrossValidationResultProducer \
-- \
  -X %(cv_folds)s \
  -S %(data_pattern)s \
  -W weka.experiment.ClassifierSplitEvaluator \
  -- \
     -C %(ir_class_index)s \
     %(classifier)s 
     

The operation sets most of the parameters automatically, for instance %(run_number)s is automatically replaces with the respective run_number. %(classifier)s is replaced by the respective classifier that should be used, for instance ‘-W weka.classifiers.bayes.NaiveBayes’ for the Naive Bayes classifier. An operation specification file for an operation using such a Weka template might look as follows:

type: weka_classification

input_path: "arff_dataset_summary"

template: classification
parameter_ranges :
    classifier: ['smo']
    ir_class_index: [2]
    complexity: [1.0]
    exponent: [1.0]

Here, “template: classification” controls which weka template is used, “classifier: ‘naive_bayes’” determines the classifier that is inserted into the templates parameter %(classifier)s, and “ir_class_index” controls on which class information retrieval metrics like precision and recall are based.

The mapping from the name ‘naive_bayes’ to actual text ‘-W weka.classifiers.bayes.NaiveBayes’ that is inserted into the template is handled by the file “abbreviations.yaml”. The following abbreviations for Weka are currently defined:

# Classifiers
naive_bayes: '-W weka.classifiers.bayes.NaiveBayes'
logistic: '-W weka.classifiers.functions.Logistic'
smo: '-W weka.classifiers.functions.SMO -- -C %(complexity)s -K "weka.classifiers.functions.supportVector.PolyKernel -C 250007 -E %(exponent)s"'
smo_RBF: '-W weka.classifiers.functions.SMO -- -C %(complexity)s -K "weka.classifiers.functions.supportVector.RBFKernel -C 250007 -G %(gamma)s"'
libsvm_lin: '-W weka.classifiers.functions.LibSVM -- -S 0 -K 0 -D 0 -G 0.0 -R 0.0 -N 0.5 -M 40.0 -C %(complexity)s -E 0.0010 -P 0.1 -W "%(w0)s %(w1)s"'
libsvm_pol: '-W weka.classifiers.functions.LibSVM -- -S 0 -K 1 -D %(exponent)s -G %(gamma)s -R %(offset)s -N 0.5 -M 40.0 -C %(complexity)s -E 0.0010 -P 0.1 -W "%(w0)s %(w1)s"'
libsvm_RBF: '-W weka.classifiers.functions.LibSVM -- -S 0 -K 2 -D 0 -G %(gamma)s -R %(offset)s -N 0.5 -M 40.0 -C %(complexity)s -E 0.0010 -P 0.1 -W "%(w0)s %(w1)s"'
libsvm_sig: '-W weka.classifiers.functions.LibSVM -- -S 0 -K 3 -D 0 -G %(gamma)s -R %(offset)s -N 0.5 -M 40.0 -C %(complexity)s -E 0.0010 -P 0.1 -W "%(w0)s %(w1)s"'
ib_1: '-W weka.classifiers.lazy.IB1'
lwl: 'weka.classifiers.lazy.LWL -U 0 -K -1 -A "weka.core.neighboursearch.LinearNNSearch -A \"weka.core.EuclideanDistance -R first-last\"" -W weka.classifiers.trees.DecisionStump'
random_forest: '-W weka.classifiers.trees.RandomForest -- -I 10 -K 0 -S 1'
bagging_j48: '-W weka.classifiers.meta.Bagging -- -P 100 -S 1 -I 10 -W weka.classifiers.trees.J48 -- -C 0.25 -M 2'
bagging_part: '-W weka.classifiers.meta.Bagging -- -P 100 -S 1 -I 10 -W weka.classifiers.rules.PART -- -M 2 -C 0.25 -Q 1'
adaboost_j48: '-W weka.classifiers.meta.AdaBoostM1 -- -S 1 -I 10 -P 100 -- -P 100 -S 1 -I 10 -W weka.classifiers.trees.J48 -- -C 0.25 -M 2'
adaboost_part: '-W weka.classifiers.meta.AdaBoostM1 -- -S 1 -I 10 -P 100 -- -P 100 -S 1 -I 10 -W weka.classifiers.rules.PART -- -M 2 -C 0.25 -Q 1'
multiboost_j48: '-W weka.classifiers.meta.MultiBoostAB -- -C 3 -P 100 -S 1 -I 10 -W weka.classifiers.trees.J48 -- -C 0.25 -M 2'
multiboost_part: '-W weka.classifiers.meta.MultiBoostAB -- -C 3 -P 100 -S 1 -I 10 -W weka.classifiers.rules.PART -- -M 2 -C 0.25 -Q 1'

# Feature Selection
chi_squared: 'weka.attributeSelection.ChiSquaredAttributeEval'
info_gain: 'weka.attributeSelection.InfoGainAttributeEval'
gain_ratio: 'weka.attributeSelection.GainRatioAttributeEval'
relief: 'weka.attributeSelection.ReliefFAttributeEval'
CFS: 'weka.attributeSelection.CfsSubsetEval'

# Ranker
ranker: 'weka.attributeSelection.Ranker -T -1.7976931348623157E308 -N %(num_retained_features)s'
greedy_stepwise: 'weka.attributeSelection.GreedyStepwise -T -1.7976931348623157E308 -N %(num_retained_features)s'
best_first: 'weka.attributeSelection.BestFirst -N %(num_non-improving_nodes)s'

The usage of the windower folder is described in:

It specifies one connection between datasets and NodeChainOperation.