base¶
Module: resources.dataset_defs.base
¶
Base Module for datasets to specify the interface for these
Inheritance diagram for pySPACE.resources.dataset_defs.base
:
Class Summary¶
UnknownDatasetTypeException |
Wrapper around error, when dataset type is not available |
BaseDataset ([dataset_md]) |
Base class for datasets |
Classes¶
BaseDataset
¶
-
class
pySPACE.resources.dataset_defs.base.
BaseDataset
(dataset_md=None)[source]¶ Bases:
object
Base class for datasets
This class (BaseDataset) acts as base class for all dataset classes and specifies the interface for these. Furthermore it provides a factory method load for all types of datasets. It expects a path to the datasets storage directory.
The following methods must be implemented:
__init__: The constructor must take an argument dataset_md that is a dictionary containing meta data for the dataset to be loaded. store: A method that stores a dataset in a certain directory. store and __init__ should be written so that __init__ can correctly recreate every dataset stored with store add_sample: (optional) Adds a new sample to the dataset. BaseDataset provides a default implementation. Datasets store the data in the attribute self.data. This data is stored as a dictionary that maps (run, split, train/test) tuples to the actual data obtained in this split in this run for training/testing.
Class Components Summary
__del__
()Remove logging handler __repr__
()Return a string representation of this class _log
(message[, level])Logs the given message with the given logging level add_sample
(sample, label, train[, split, run])Add a sample to this dataset dump
(result_path, name)Dumps this dataset into a file. get_run_numbers
()Return the number of the runs contained in this dataset get_split_numbers
([current_run])Return the number of the splits load
(dataset_dir)Loads the dataset stored in directory rel_dataset_dir load_meta_data
(dataset_dir[, file_name])Load the meta data of the dataset store
(result_dir[, s_format])Stores this dataset in the directory result_dir. store_meta_data
(dataset_dir, meta_data[, ...])Stores the meta data of a dataset update_meta_data
(meta_data)Updates the internal meta_data dictionary with meta_data -
classmethod
load
(dataset_dir)[source]¶ Loads the dataset stored in directory rel_dataset_dir
This method loads the dataset stored in the directory rel_dataset_dir . Depending on the type stored in the datasets meta-data file, the method creates an instance of a specific dataset class.
- The method expects the following parameters:
- dataset_dir : The (absolute) directory in which the dataset that will be loaded is located
-
static
load_meta_data
(dataset_dir, file_name='metadata.yaml')[source]¶ Load the meta data of the dataset
-
static
store_meta_data
(dataset_dir, meta_data, file_name='metadata.yaml')[source]¶ Stores the meta data of a dataset
-
add_sample
(sample, label, train, split=0, run=0)[source]¶ Add a sample to this dataset
Adds the sample sample along with its class label label to this dataset.
- The method expects the following parameters:
- sample : The respective data sample
- label : The label of the data sample
- train : If train, this sample has already been used for training
- split : The number of the split this sample belongs to. Defaults to 0.
- run: The run number this sample belongs to Defaults to 0
-
get_split_numbers
(current_run=0)[source]¶ Return the number of the splits
Returns the number of splits contained in this dataset for the given run number current_number
-
dump
(result_path, name)[source]¶ Dumps this dataset into a file.
Dumps (i.e. pickle) this dataset object into a bz2 compressed file. In contrast to store this method stores the whole dataset in a file. No meta data are stored in a YAML file etc.
- The method expects the following parameters:
- result_path The path to the directory in which the pickle file will be written.
- name The name of the pickle file
-
store
(result_dir, s_format=None)[source]¶ Stores this dataset in the directory result_dir.
In contrast to dump this method stores the dataset not in a single file but as a whole directory structure with meta information etc. The data sets are stored separately for each run, split, train/test combination.
- The method expects the following parameters:
- result_dir The directory in which the dataset will be stored
- s_format The format in which the actual data sets should be stored.
Note
Needs to be overwritten by the subclasses!
-
__weakref__
¶ list of weak references to the object (if defined)
-
classmethod