The Specs DirectoryΒΆ
The pySPACE specs directory contains all kinds of specifications for pySPACE. For instance it contains operation_chains and operation specification files and node chain specification files. The structure of the specs directory is as follows:
Specs_Dir
/node_chains
/operation_chains
/operations
The default $Specs_Dir
can be found at ~/pySPACEcenter/specs/
.
If one starts pySPACE with python launch.py --operation_chain example.yaml ...
,
the operation_chain specification file ‘example.yaml’ is looked up in $Specs_Dir/operation_chains.
Similarly, python launch.py --operation example_operation.yaml ...
will look for the file example_operation.yaml
in $Specs_Dir/operations
.
The directory node_chains
contains specifications of data processing flows
in the form of a concatenation of nodes
that can be used within an operation of type
node_chain
.
For instance, the node_chain used in the
NodeChainOperation
found at $Specs_Dir/operations/examples/node_chain.yaml
:
# An example of a *node_chain* specification file.
# The specified input is the value of the entry
# with the key "input_path", the templates the value of "templates".
# This template is parametrized with two parameters called
# "__LOWER_CUTOFF__", and "__UPPER_CUTOFF__". Optionally, some "constraints"
# on the allowed parameter combinations can be defined. For instance,
# the constraint "__LOWER_CUTOFF__ < __UPPER_CUTOFF__" prevents that
# the combination where both __LOWER_CUTOFF__ and __UPPER_CUTOFF__ are 2.0
# is tested. For each combination of the given values for these two parameters
# that fulfills all constraints and the datasets of the
# input summary, one result dataset is created. This result dataset
# consists of the results of 10 independent runs with the
# instantiated template performed to the respective input dataset.
# Each such run is an independent process.
# The optional parameter "backend" allows to overwrite the backend specification provided
# via the command-line. This is useful if the operation is part of a chain
# and different operations of the chain should
# not be executed on the same modality.
type: node_chain
input_path: "example_data"
templates : ["example_flow.yaml"]
backend: "local"
parameter_ranges :
__LOWER_CUTOFF__ : [0.1, 1.0, 2.0]
__UPPER_CUTOFF__ : [2.0, 4.0]
constraints:
- "__LOWER_CUTOFF__ < __UPPER_CUTOFF__"
runs : 10
might look as follows:
-
node: Time_Series_Source
-
node : CV_Splitter
parameters :
splits : 10
-
node : Devariancing
parameters :
devariance_method : "eval(__import__('numpy').std)"
-
node : PCA
parameters :
retained_channels : 32
-
node : FFT_Band_Pass_Filter
parameters :
pass_band : [${__LOWER_CUTOFF__}, ${__UPPER_CUTOFF__}]
-
node : Subsampling
parameters :
target_frequency : 20.0
-
node : BBCI_Features
parameters :
datapoints : [-4,-3,-2,-1]
-
node: Labeled_Feature_Vector_CV_Sink
parameters :
classes_names : ["LRP", "NoLRP"]
Operation chain, operation and node chain specifications are described in extra sections.
The directory weka_templates
contains templates for command line calls of Weka.
These templates are parametrized and the respective instantiation
of the template depends on the parameters of the operation.
For instance, the template for Weka classification operations look as follows:
/usr/bin/java 1>>%(temp_results)s/../weka.out 2>> %(temp_results)s/../weka.out -Xmx2048M -Djava.awt.headless=true -classpath %(weka_class_path)s \
weka.experiment.Experiment -r \
-L %(run_number)s -U %(run_number)s \
-T %(sample_dataset)s \
-D 'weka.experiment.CSVResultListener -O %(temp_results)s/results_%(unique_id)s.csv' \
-P weka.experiment.ExternalCrossValidationResultProducer \
-- \
-X %(cv_folds)s \
-S %(data_pattern)s \
-W weka.experiment.ClassifierSplitEvaluator \
-- \
-C %(ir_class_index)s \
%(classifier)s
The operation sets most of the parameters automatically, for instance %(run_number)s is automatically replaces with the respective run_number. %(classifier)s is replaced by the respective classifier that should be used, for instance ‘-W weka.classifiers.bayes.NaiveBayes’ for the Naive Bayes classifier. An operation specification file for an operation using such a Weka template might look as follows:
type: weka_classification
input_path: "arff_dataset_summary"
template: classification
parameter_ranges :
classifier: ['smo']
ir_class_index: [2]
complexity: [1.0]
exponent: [1.0]
Here, “template: classification” controls which weka template is used, “classifier: ‘naive_bayes’” determines the classifier that is inserted into the templates parameter %(classifier)s, and “ir_class_index” controls on which class information retrieval metrics like precision and recall are based.
The mapping from the name ‘naive_bayes’ to actual text ‘-W weka.classifiers.bayes.NaiveBayes’ that is inserted into the template is handled by the file “abbreviations.yaml”. The following abbreviations for Weka are currently defined:
# Classifiers
naive_bayes: '-W weka.classifiers.bayes.NaiveBayes'
logistic: '-W weka.classifiers.functions.Logistic'
smo: '-W weka.classifiers.functions.SMO -- -C %(complexity)s -K "weka.classifiers.functions.supportVector.PolyKernel -C 250007 -E %(exponent)s"'
smo_RBF: '-W weka.classifiers.functions.SMO -- -C %(complexity)s -K "weka.classifiers.functions.supportVector.RBFKernel -C 250007 -G %(gamma)s"'
libsvm_lin: '-W weka.classifiers.functions.LibSVM -- -S 0 -K 0 -D 0 -G 0.0 -R 0.0 -N 0.5 -M 40.0 -C %(complexity)s -E 0.0010 -P 0.1 -W "%(w0)s %(w1)s"'
libsvm_pol: '-W weka.classifiers.functions.LibSVM -- -S 0 -K 1 -D %(exponent)s -G %(gamma)s -R %(offset)s -N 0.5 -M 40.0 -C %(complexity)s -E 0.0010 -P 0.1 -W "%(w0)s %(w1)s"'
libsvm_RBF: '-W weka.classifiers.functions.LibSVM -- -S 0 -K 2 -D 0 -G %(gamma)s -R %(offset)s -N 0.5 -M 40.0 -C %(complexity)s -E 0.0010 -P 0.1 -W "%(w0)s %(w1)s"'
libsvm_sig: '-W weka.classifiers.functions.LibSVM -- -S 0 -K 3 -D 0 -G %(gamma)s -R %(offset)s -N 0.5 -M 40.0 -C %(complexity)s -E 0.0010 -P 0.1 -W "%(w0)s %(w1)s"'
ib_1: '-W weka.classifiers.lazy.IB1'
lwl: 'weka.classifiers.lazy.LWL -U 0 -K -1 -A "weka.core.neighboursearch.LinearNNSearch -A \"weka.core.EuclideanDistance -R first-last\"" -W weka.classifiers.trees.DecisionStump'
random_forest: '-W weka.classifiers.trees.RandomForest -- -I 10 -K 0 -S 1'
bagging_j48: '-W weka.classifiers.meta.Bagging -- -P 100 -S 1 -I 10 -W weka.classifiers.trees.J48 -- -C 0.25 -M 2'
bagging_part: '-W weka.classifiers.meta.Bagging -- -P 100 -S 1 -I 10 -W weka.classifiers.rules.PART -- -M 2 -C 0.25 -Q 1'
adaboost_j48: '-W weka.classifiers.meta.AdaBoostM1 -- -S 1 -I 10 -P 100 -- -P 100 -S 1 -I 10 -W weka.classifiers.trees.J48 -- -C 0.25 -M 2'
adaboost_part: '-W weka.classifiers.meta.AdaBoostM1 -- -S 1 -I 10 -P 100 -- -P 100 -S 1 -I 10 -W weka.classifiers.rules.PART -- -M 2 -C 0.25 -Q 1'
multiboost_j48: '-W weka.classifiers.meta.MultiBoostAB -- -C 3 -P 100 -S 1 -I 10 -W weka.classifiers.trees.J48 -- -C 0.25 -M 2'
multiboost_part: '-W weka.classifiers.meta.MultiBoostAB -- -C 3 -P 100 -S 1 -I 10 -W weka.classifiers.rules.PART -- -M 2 -C 0.25 -Q 1'
# Feature Selection
chi_squared: 'weka.attributeSelection.ChiSquaredAttributeEval'
info_gain: 'weka.attributeSelection.InfoGainAttributeEval'
gain_ratio: 'weka.attributeSelection.GainRatioAttributeEval'
relief: 'weka.attributeSelection.ReliefFAttributeEval'
CFS: 'weka.attributeSelection.CfsSubsetEval'
# Ranker
ranker: 'weka.attributeSelection.Ranker -T -1.7976931348623157E308 -N %(num_retained_features)s'
greedy_stepwise: 'weka.attributeSelection.GreedyStepwise -T -1.7976931348623157E308 -N %(num_retained_features)s'
best_first: 'weka.attributeSelection.BestFirst -N %(num_non-improving_nodes)s'
The usage of the windower
folder is described in:
It specifies one connection between datasets
and NodeChainOperation
.