base

Module: resources.dataset_defs.base

Base Module for datasets to specify the interface for these

Inheritance diagram for pySPACE.resources.dataset_defs.base:

Inheritance diagram of pySPACE.resources.dataset_defs.base

Class Summary

UnknownDatasetTypeException Wrapper around error, when dataset type is not available
BaseDataset([dataset_md]) Base class for datasets

Classes

UnknownDatasetTypeException

class pySPACE.resources.dataset_defs.base.UnknownDatasetTypeException[source]

Bases: exceptions.Exception

Wrapper around error, when dataset type is not available

__weakref__

list of weak references to the object (if defined)

BaseDataset

class pySPACE.resources.dataset_defs.base.BaseDataset(dataset_md=None)[source]

Bases: object

Base class for datasets

This class (BaseDataset) acts as base class for all dataset classes and specifies the interface for these. Furthermore it provides a factory method load for all types of datasets. It expects a path to the datasets storage directory.

The following methods must be implemented:

__init__:The constructor must take an argument dataset_md that is a dictionary containing meta data for the dataset to be loaded.
store:A method that stores a dataset in a certain directory. store and __init__ should be written so that __init__ can correctly recreate every dataset stored with store
add_sample:(optional) Adds a new sample to the dataset. BaseDataset provides a default implementation.

Datasets store the data in the attribute self.data. This data is stored as a dictionary that maps (run, split, train/test) tuples to the actual data obtained in this split in this run for training/testing.

Class Components Summary

__del__() Remove logging handler
__repr__() Return a string representation of this class
_log(message[, level]) Logs the given message with the given logging level
add_sample(sample, label, train[, split, run]) Add a sample to this dataset
dump(result_path, name) Dumps this dataset into a file.
get_run_numbers() Return the number of the runs contained in this dataset
get_split_numbers([current_run]) Return the number of the splits
load(dataset_dir) Loads the dataset stored in directory rel_dataset_dir
load_meta_data(dataset_dir[, file_name]) Load the meta data of the dataset
store(result_dir[, s_format]) Stores this dataset in the directory result_dir.
store_meta_data(dataset_dir, meta_data[, ...]) Stores the meta data of a dataset
update_meta_data(meta_data) Updates the internal meta_data dictionary with meta_data
__init__(dataset_md=None)[source]
classmethod load(dataset_dir)[source]

Loads the dataset stored in directory rel_dataset_dir

This method loads the dataset stored in the directory rel_dataset_dir . Depending on the type stored in the datasets meta-data file, the method creates an instance of a specific dataset class.

The method expects the following parameters:
  • dataset_dir : The (absolute) directory in which the dataset that will be loaded is located
static load_meta_data(dataset_dir, file_name='metadata.yaml')[source]

Load the meta data of the dataset

static store_meta_data(dataset_dir, meta_data, file_name='metadata.yaml')[source]

Stores the meta data of a dataset

add_sample(sample, label, train, split=0, run=0)[source]

Add a sample to this dataset

Adds the sample sample along with its class label label to this dataset.

The method expects the following parameters:
  • sample : The respective data sample
  • label : The label of the data sample
  • train : If train, this sample has already been used for training
  • split : The number of the split this sample belongs to. Defaults to 0.
  • run: The run number this sample belongs to Defaults to 0
update_meta_data(meta_data)[source]

Updates the internal meta_data dictionary with meta_data

get_run_numbers()[source]

Return the number of the runs contained in this dataset

get_split_numbers(current_run=0)[source]

Return the number of the splits

Returns the number of splits contained in this dataset for the given run number current_number

dump(result_path, name)[source]

Dumps this dataset into a file.

Dumps (i.e. pickle) this dataset object into a bz2 compressed file. In contrast to store this method stores the whole dataset in a file. No meta data are stored in a YAML file etc.

The method expects the following parameters:
  • result_path The path to the directory in which the pickle file will be written.
  • name The name of the pickle file
store(result_dir, s_format=None)[source]

Stores this dataset in the directory result_dir.

In contrast to dump this method stores the dataset not in a single file but as a whole directory structure with meta information etc. The data sets are stored separately for each run, split, train/test combination.

The method expects the following parameters:
  • result_dir The directory in which the dataset will be stored
  • s_format The format in which the actual data sets should be stored.

Note

Needs to be overwritten by the subclasses!

_log(message, level=20)[source]

Logs the given message with the given logging level

__del__()[source]

Remove logging handler

__repr__()[source]

Return a string representation of this class

__weakref__

list of weak references to the object (if defined)