base¶

Module: `resources.dataset_defs.base`¶

Base Module for datasets to specify the interface for these

Inheritance diagram for pySPACE.resources.dataset_defs.base:

Inheritance diagram of pySPACE.resources.dataset_defs.base

Class Summary¶

`UnknownDatasetTypeException`	Wrapper around error, when dataset type is not available
`BaseDataset`([dataset_md])	Base class for datasets

Classes¶

`UnknownDatasetTypeException`¶

class pySPACE.resources.dataset_defs.base.UnknownDatasetTypeException[source]¶

Bases: exceptions.Exception

Wrapper around error, when dataset type is not available

__weakref__¶: list of weak references to the object (if defined)

`BaseDataset`¶

class pySPACE.resources.dataset_defs.base.BaseDataset(dataset_md=None)[source]¶

Bases: object

Base class for datasets

This class (BaseDataset) acts as base class for all dataset classes and specifies the interface for these. Furthermore it provides a factory method load for all types of datasets. It expects a path to the datasets storage directory.

The following methods must be implemented:

__init__: The constructor must take an argument dataset_md that is a dictionary containing meta data for the dataset to be loaded.

store: A method that stores a dataset in a certain directory. store and __init__ should be written so that __init__ can correctly recreate every dataset stored with store

add_sample: (optional) Adds a new sample to the dataset. BaseDataset provides a default implementation.

Datasets store the data in the attribute self.data. This data is stored as a dictionary that maps (run, split, train/test) tuples to the actual data obtained in this split in this run for training/testing.

Class Components Summary

`__del__`()	Remove logging handler
`__repr__`()	Return a string representation of this class
`_log`(message[, level])	Logs the given message with the given logging level
`add_sample`(sample, label, train[, split, run])	Add a sample to this dataset
`dump`(result_path, name)	Dumps this dataset into a file.
`get_run_numbers`()	Return the number of the runs contained in this dataset
`get_split_numbers`([current_run])	Return the number of the splits
`load`(dataset_dir)	Loads the dataset stored in directory rel_dataset_dir
`load_meta_data`(dataset_dir[, file_name])	Load the meta data of the dataset
`store`(result_dir[, s_format])	Stores this dataset in the directory result_dir.
`store_meta_data`(dataset_dir, meta_data[, ...])	Stores the meta data of a dataset
`update_meta_data`(meta_data)	Updates the internal meta_data dictionary with meta_data

__init__(dataset_md=None)[source]¶

classmethod load(dataset_dir)[source]¶

Loads the dataset stored in directory rel_dataset_dir

This method loads the dataset stored in the directory rel_dataset_dir . Depending on the type stored in the datasets meta-data file, the method creates an instance of a specific dataset class.

The method expects the following parameters:

dataset_dir : The (absolute) directory in which the dataset that will be loaded is located

static load_meta_data(dataset_dir, file_name='metadata.yaml')[source]¶: Load the meta data of the dataset

static store_meta_data(dataset_dir, meta_data, file_name='metadata.yaml')[source]¶: Stores the meta data of a dataset

add_sample(sample, label, train, split=0, run=0)[source]¶

Add a sample to this dataset

Adds the sample sample along with its class label label to this dataset.

The method expects the following parameters:

sample : The respective data sample
label : The label of the data sample
train : If train, this sample has already been used for training
split : The number of the split this sample belongs to. Defaults to 0.
run: The run number this sample belongs to Defaults to 0

update_meta_data(meta_data)[source]¶: Updates the internal meta_data dictionary with meta_data

get_run_numbers()[source]¶: Return the number of the runs contained in this dataset

get_split_numbers(current_run=0)[source]¶

Return the number of the splits

Returns the number of splits contained in this dataset for the given run number current_number

dump(result_path, name)[source]¶

Dumps this dataset into a file.

Dumps (i.e. pickle) this dataset object into a bz2 compressed file. In contrast to store this method stores the whole dataset in a file. No meta data are stored in a YAML file etc.

The method expects the following parameters:

result_path The path to the directory in which the pickle file will be written.
name The name of the pickle file

store(result_dir, s_format=None)[source]¶

Stores this dataset in the directory result_dir.

In contrast to dump this method stores the dataset not in a single file but as a whole directory structure with meta information etc. The data sets are stored separately for each run, split, train/test combination.

The method expects the following parameters:

result_dir The directory in which the dataset will be stored
s_format The format in which the actual data sets should be stored.

Note

Needs to be overwritten by the subclasses!

_log(message, level=20)[source]¶: Logs the given message with the given logging level

__del__()[source]¶: Remove logging handler

__repr__()[source]¶: Return a string representation of this class

__weakref__¶: list of weak references to the object (if defined)

Table Of Contents

Previous topic

Next topic

This Page

base¶

Module: `resources.dataset_defs.base`¶

Class Summary¶

Classes¶

`UnknownDatasetTypeException`¶

`BaseDataset`¶

__init__:	The constructor must take an argument dataset_md that is a dictionary containing meta data for the dataset to be loaded.
store:	A method that stores a dataset in a certain directory. store and __init__ should be written so that __init__ can correctly recreate every dataset stored with store
add_sample:	(optional) Adds a new sample to the dataset. BaseDataset provides a default implementation.

base¶

Module: resources.dataset_defs.base¶

Class Summary¶

Classes¶

UnknownDatasetTypeException¶

BaseDataset¶

Module: `resources.dataset_defs.base`¶

`UnknownDatasetTypeException`¶

`BaseDataset`¶