base¶
Module: resources.dataset_defs.base¶
Base Module for datasets to specify the interface for these
Inheritance diagram for pySPACE.resources.dataset_defs.base:
Class Summary¶
UnknownDatasetTypeException |
Wrapper around error, when dataset type is not available |
BaseDataset([dataset_md]) |
Base class for datasets |
Classes¶
BaseDataset¶
-
class
pySPACE.resources.dataset_defs.base.BaseDataset(dataset_md=None)[source]¶ Bases:
objectBase class for datasets
This class (BaseDataset) acts as base class for all dataset classes and specifies the interface for these. Furthermore it provides a factory method load for all types of datasets. It expects a path to the datasets storage directory.
The following methods must be implemented:
__init__: The constructor must take an argument dataset_md that is a dictionary containing meta data for the dataset to be loaded. store: A method that stores a dataset in a certain directory. store and __init__ should be written so that __init__ can correctly recreate every dataset stored with store add_sample: (optional) Adds a new sample to the dataset. BaseDataset provides a default implementation. Datasets store the data in the attribute self.data. This data is stored as a dictionary that maps (run, split, train/test) tuples to the actual data obtained in this split in this run for training/testing.
Class Components Summary
__del__()Remove logging handler __repr__()Return a string representation of this class _log(message[, level])Logs the given message with the given logging level add_sample(sample, label, train[, split, run])Add a sample to this dataset dump(result_path, name)Dumps this dataset into a file. get_run_numbers()Return the number of the runs contained in this dataset get_split_numbers([current_run])Return the number of the splits load(dataset_dir)Loads the dataset stored in directory rel_dataset_dir load_meta_data(dataset_dir[, file_name])Load the meta data of the dataset store(result_dir[, s_format])Stores this dataset in the directory result_dir. store_meta_data(dataset_dir, meta_data[, ...])Stores the meta data of a dataset update_meta_data(meta_data)Updates the internal meta_data dictionary with meta_data -
classmethod
load(dataset_dir)[source]¶ Loads the dataset stored in directory rel_dataset_dir
This method loads the dataset stored in the directory rel_dataset_dir . Depending on the type stored in the datasets meta-data file, the method creates an instance of a specific dataset class.
- The method expects the following parameters:
- dataset_dir : The (absolute) directory in which the dataset that will be loaded is located
-
static
load_meta_data(dataset_dir, file_name='metadata.yaml')[source]¶ Load the meta data of the dataset
-
static
store_meta_data(dataset_dir, meta_data, file_name='metadata.yaml')[source]¶ Stores the meta data of a dataset
-
add_sample(sample, label, train, split=0, run=0)[source]¶ Add a sample to this dataset
Adds the sample sample along with its class label label to this dataset.
- The method expects the following parameters:
- sample : The respective data sample
- label : The label of the data sample
- train : If train, this sample has already been used for training
- split : The number of the split this sample belongs to. Defaults to 0.
- run: The run number this sample belongs to Defaults to 0
-
get_split_numbers(current_run=0)[source]¶ Return the number of the splits
Returns the number of splits contained in this dataset for the given run number current_number
-
dump(result_path, name)[source]¶ Dumps this dataset into a file.
Dumps (i.e. pickle) this dataset object into a bz2 compressed file. In contrast to store this method stores the whole dataset in a file. No meta data are stored in a YAML file etc.
- The method expects the following parameters:
- result_path The path to the directory in which the pickle file will be written.
- name The name of the pickle file
-
store(result_dir, s_format=None)[source]¶ Stores this dataset in the directory result_dir.
In contrast to dump this method stores the dataset not in a single file but as a whole directory structure with meta information etc. The data sets are stored separately for each run, split, train/test combination.
- The method expects the following parameters:
- result_dir The directory in which the dataset will be stored
- s_format The format in which the actual data sets should be stored.
Note
Needs to be overwritten by the subclasses!
-
__weakref__¶ list of weak references to the object (if defined)
-
classmethod