Installation

Warning

Currently the software is developed under Ubuntu and MacOSX. It is highly recommended to use a UNIX OS for pySPACE. However it is possible to run pySPACE on Windows.

pySPACE is currently developed at the RIC Bremen (part of the DFKI GmbH) and at the AG Robotik of the University Bremen.

The installation process consists of several steps:

  1. Download and install pySPACE itself
  2. Install the needed extra packages/modules
  3. Prepare the pySPACE user environment

1. Downloading

Step 1 is quite easy: Download the software. Currently, there is no real installation script implemented, though this will hopefully change in future. So simply save the software, where you want and where you can find it easily. For interactively using the software it might be also useful to add the folder path to the PYTHONPATH variable. For the normal usage this is not required.

2. Extra packages

Besides the standard Python library, some extra packages/modules are needed that you have to download/install yourself. It is recommended to use a package manager to install these packages (e.g. apt or macports).

Mandatory packages are:

Python2.7:

http://www.python.org/ (the main programming language)

The software also works with Python2.7. It is important to install the relevant Python packages for the same Python and to finally start it with this version.

YAML:

http://www.yaml.org/

for reading and writing configuration files

enum:

http://cheeseshop.python.org/pypi/enum/

for robust enumerations in Python

NumPy:

http://www.numpy.org/

basic array handling library to handle data

SciPy:

http://www.scipy.org/

more complicated signal processing or linear algebra operations

# -- for Macport users --
$ sudo port install python27
# select correct Python version
$ sudo port select --set python python27
# install other mandatory packages
$ sudo port install py27-yaml py27-enum py27-numpy py27-scipy

Optional packages are:

matplotlib:

http://matplotlib.org/

making fancy plots

scikit-learn:

http://scikit-learn.org/

Many scikit algorithms are available wrapped via the scikit_nodes module and can be used like normal nodes.

Sphinx1.4:

Generate documentation

PyQt4:

basis of the guis

LIBSVM:

http://www.csie.ntu.edu.tw/~cjlin/libsvm/

Famous library for fast SVM classifiers. Be careful that the Python bindings are installed correctly. If you have access use modified version in external repository.

CVXOPT:

http://abel.ee.ucla.edu/cvxopt/

optimization toolbox, used for the construction of some classifiers, which are described by mathematical programs

mpi4py:

http://mpi4py.scipy.org/

needed, if you want to use the MpiBackend

MDP:

http://mdp-toolkit.sourceforge.net/ (tested up to version 3.1)

currently needed only for PCA, ICA, FDA but more could be integrated

external:

collection of slight modifications by the pySPACE developers of existing libraries, e.g. LIBSVM

mlpy:

For one feature selecting node.

tables:

Handle hierarchic data

. . .:

and whatever you want to integrate

Warning

When programming, always make imports to the aforementioned packages optional and used only in your module and add them here.

The User Environment

To provide the software with necessary information, this software needs one main configuration file and several further specification files. The main configuration file specifies, were to find the other specification files, were to load and store data and a lot more.

Easy Setup of Environment

The default structure of your environment is called the pySPACEcenter. It is installed into your home-directory by calling

python setup.py

in the software folder. This will create the folder pySPACEcenter and the relevant subfolders including the relevant default main configuration file and several examples.

Furthermore, links to the three main routines for using the software will be created in this folder:

launch Main program to run pySPACE
launch_live Script for running pyspace live controlling
gui.performance_results_analysis Simple Gui for loading and browsing pySPACE.resources.dataset_defs.performance_result

Warning

The name setup.py was chosen for future development, where this script will be also responsible for a real installation including the installation of dependencies and moving the needed code to the site-packages folder. So after running this script, pySPACE is probably not available in IPython or for the import in other software or scripts without additional effort.

Customized Setup of Environment

Of course you can use your own locations for every part. This is done by adapting your PYSPACE_CONF_DIR and your configuration files. Further details on setting up your main configuration file, can be found here

The Default Configuration File

Here is a documented example of the default configuration file:

---
# This is the standard default configuration file.
# Each possible parameter is mentioned here and its default value.
# Normally the default is quite useful und you won't have to change something,
# especially, when using the pySPACEcenter default configuration file.

# ===================
# = Main Parameters =
# ===================
# These parameters are the most important for pySPACE.
# The others are only relevant for special components.

# The directory from which data is loaded and stored to.
# To specify this directory is very very important.
# Default: $home_dir/pySPACEcenter/storage
storage:    ~/pySPACEcenter/storage
# The directory in which the configuration/specification files for operations, 
# operation chains, WEKA and pySPACE related options are stored.
# Default: $home_dir/pySPACEcenter/specs
spec_dir:    ~/pySPACEcenter/specs/
# The minimum level, log message must have to be printed to the stdout.
# Levels are based on the Python logging package
# possible levels are logging.{DEBUG, INFO, WARNING, CRITICAL, ERROR, FATAL}
# When using backends like the loadl backend the stdout is redirected to a file.
# If you get to much output, just use 'logging.CRITICAL'.
console_log_level:      logging.WARNING
# The minimum level, log message must have to be written to the operation log file.
# This file can be then found in your currently result folder.
# Be careful, that the file can get quite large when using DEBUG or INFO.
file_log_level:         logging.INFO
# The Python path that should be used during the experiment
# Paths normally available in Python do not have to be mentioned
# This part of setting paths is especially good to use alternative libraries,
# since the paths here get priority.
# Default: empty list
# external_nodes: ["~/pySPACEcenter/external_nodes","/opt/local/lib/python2.7/site-packages/pySPACE_extra_nodes"]
#python_path:
#        - /usr/lib/python2.5/site-packages
#        - /usr/lib/python2.5/lib-dynload/
#        - /usr/lib/python2.5
#        - /var/lib/python-support/python2.5/
#        - /usr/lib/python2.5/lib-tk/

# =========================
# = Node specific options =
# =========================

# If you want to have your own nodes outside the normal pySPACE structure
# this parameter lists external folders which where also scanned for nodes.
# Furthermore, the corresponding path is added to the local system path.
# Note, that still double naming is forbidden and crashes the software.
# See: pySPACE.missions.nodes.external
# external_nodes = [~/pySPACEcenter/external_nodes]

# ==============================
# = Operation specific options =
# ==============================

# WEKA operation
# The java class path used for WEKA
# weka_class_path:        ~weka-3-6-0/weka.jar:/home/user/weka-additional

# ============================
# = Backend specific options =
# ============================

# ===Local===

# Number of used CPUs for parallelization.
# By default the total number of available CPUs is used.
# pool_size : 1

# ===LoadLeveler===
# Specify parameters for the loadl backend for the cluster.
#
# Class name of your committed jobs. Default is 'general'.
# Depending on the class and the configuration of the cluster
# jobs with more important class name get a higher priority
job_class: general # one of [ critical, general, longterm]
# Maximal memory one process will use, since this should be known by the 
# scheduler to appropriately start more jobs or NOT. The value only effects
# the scheduling (loadleveler) but not the system, so nothing will happen to
# your jobs if they exceed the specified value.
# Default is set to the available memory divided by the number of CPUs of one
# blade (3250mb). If you do not expect such large memory, decrease the value.
consumable_memory: 3250mb # number and unit (gb,mb,kb)
# Maximal number of CPUs one job needs, since this should be known by the 
# scheduler to appropriately start more jobs or NOT. The value only effects
# the scheduling (loadleveler) but not the system, so nothing will happen to
# your jobs if they exceed the specified value.
# The default is 1.
consumable_cpus: 1
# optionally specify which nodes are used for calculation, e.g. 
# anodes: (Machine == "anode05.dfki.uni-bremen.de") || (Machine == "anode02.dfki.uni-bremen.de")


first_call : True # Internal Parameter for first call of software to give detailed information and welcome screen. It should remain at the last line!

pySPACE on Windows

pySPACE can also be used under a Windows operating system. The first and most important dependency that must be installed is a python bundle that (preferably) comes with a large part of the heavy-weigth dependencies needed by pySPACE, e.g. numpy, matplotlib etc. The other two must-have dependencies are a git management software and either a text editor that can edit .py and .yaml files or an IDE that can do the same thing. Below is a list of the software we recommend using when building and running pySPACE:

Besides these, depending on the specific nodes that you plan on using, further dependencies might be neccessary. It is worth noting that, when the setup script is launched, it might blacklist some of the available nodes due to missing dependencies. Should you install these missing dependencies, you can refresh the list of missing dependencies by running the setup script with the -b option enabled:

python setup.py -b

This will refresh the blacklisted nodes and overwrite the previous list. For more details related to the setup script, please run the help option of the setup script as:

python setup.py -h

It should be noted that pySPACE is developed using under and for UNIX systems. Therefore, the Windows support for the software suite is limited. Nonetheless, the basic functionalities of the software are available under a DOS operating system and, depending on the availabilty of the necessary dependencies, can be used to its fullest on DOS-operating systems.

Note

When writing new nodes, special attention should be given to OS-independent implementations. As an example of this approach, the numpy.float128 precision is not available under DOS-systems. There is however an alternative precision floating point, namely the numpy.longdouble that serves the same purpose yet is OS-independent. While this is merely an example that is meant to show the motivation behind an OS-independent approach, the general idea is the same for different issues. If there is an OS-independent approach, it should be favored in the implementation process.

Note

Another example of a very important OS-independent approach is that of using os.sep from the python os module whenever file paths are in usage. Since DOS and UNIX systems use different path separators, it is of the utmost importance that whenever new nodes are written and some sort of path manipulation is necessary, the separator be obtained from os.sep and not be hardcoded in the python script.

Terminal usage

Under Windows, pySPACE can be used from the terminal in the same manner as one would under a UNIX system. Thanks to OS-independent python shortcuts, once you have prepaired your data and operation chain, you can save and execute them from pySPACEcenter. While pySPACE is primarily a UNIX oriented software package, the contributors strive to build OS-independent python scripts.

Writing new nodes

Should you want to develop new pySPACE nodes under Windows, please be aware that there are certain software packages that only run on DOS (Windows) systems while others run only on UNIX (Mac, Linux) systems. These OS-specific packages should be avoided as much as possible. If such a package is absolutely necessary, please consider implementing it in a if clause that first establishes the OS under use and then chooses the appropriate method of implementation. In most cases though there is an OS-independent implementation which is definitely the preferred version.

Next Steps

After the installation you might want to