Module: resources.dataset_defs.prediction_vector

Load and store data sets containing Prediction Vectors

Inheritance diagram for pySPACE.resources.dataset_defs.prediction_vector:

Inheritance diagram of pySPACE.resources.dataset_defs.prediction_vector


class pySPACE.resources.dataset_defs.prediction_vector.PredictionVectorDataset(dataset_md=None, num_predictors=1, **kwargs)[source]

Bases: pySPACE.resources.dataset_defs.base.BaseDataset

Prediction Vector dataset class

The class at hand contains the methods needed to work with the datasets consisting of PredictionVectorDataset

The following data formats are currently supported:
  • *.csv - with or without a header column
  • *.pickle

TODO: Add functionality for the *.arff format.


The implementation of the current dataset is adapted from the FeatureVectorDataset and TimeSeriesDataset. For a more thorough documentation, we refer the reader to the 2 datasets mentioned above.


dataset_md:Dictionary containing meta data for the collection to be loaded. Out of these parameters, the most important one is the number of predictors since it how the Prediction Vectors will be generated.
num_predictors:The number of predictors that each PredictionVector contains. This parameter is important for determining the dimensionality of the PredictionVector.

Special CSV Parameters


Needed only when dealing with *.csv files. The default is , is used or the tabulator ` . When storing, `, is used.

(recommended, default: ‘,’)


Column containing the true label of the data point

Normally this column looses its heading. when saving the csv file, the default, -1, is used.

(recommended, default: -1)


List of numbers containing the numbers of irrelevant columns, e.g., [1,2,8,42]

After the data is loaded, this parameter becomes obsolete.

(optional, default: [])


Replace row in description of ‘ignored_columns’

(optional, default: [])

Author:Andrei Ignat (

Class Components Summary

add_sample(sample, label, train[, split, run]) Add a prediction vector to this collection
dump(result_path, name) Dumps this collection into a file
get_data(run_nr, split_nr, train_test) Load the data from a prediction file
store(result_dir[, s_format]) store the collection in result_dir
__init__(dataset_md=None, num_predictors=1, **kwargs)[source]

Read out the data from the given collection

add_sample(sample, label, train, split=0, run=0)[source]

Add a prediction vector to this collection

dump(result_path, name)[source]

Dumps this collection into a file

get_data(run_nr, split_nr, train_test)[source]

Load the data from a prediction file

store(result_dir, s_format=['pickle', 'real'])[source]

store the collection in result_dir