base

Module: missions.nodes.classification.base

Base classes for classification

Inheritance diagram for pySPACE.missions.nodes.classification.base:

Inheritance diagram of pySPACE.missions.nodes.classification.base

Class Summary

RegularizedClassifierBase([regression, ...]) Basic class for regularized (kernel) classifiers with extra support in
TimeoutException Break up for to long simplex iterations

Classes

RegularizedClassifierBase

class pySPACE.missions.nodes.classification.base.RegularizedClassifierBase(regression=False, complexity=1, weight=None, kernel_type='LINEAR', exponent=2, gamma=None, offset=0, nu=0.5, epsilon=0.1, class_labels=None, debug=False, max_time=3600, tolerance=0.001, complexities_path=None, keep_vectors=False, use_list=False, multinomial=False, add_type='ADD_ALL', discard_type='REMOVE_OLDEST', keep_only_sv=False, basket_size=inf, relabel=False, border_handling='USE_ONLY_BORDER_POINTS', scale_factor_small=0.3, scale_factor_tall=0.5, p_threshold=0.8, show_plot=False, save_plot=False, cdt_threshold=10, u_retrain=False, training_set_ratio='DONT_HANDLE_RATIO', plot_storage='./plot_storage', ratio=0.5, **kwargs)[source]

Bases: pySPACE.missions.nodes.base_node.BaseNode

Basic class for regularized (kernel) classifiers with extra support in the linear case

This module also implements several concepts of data handling strategies to keep the set of training samples limited especially in an online learning scenario. These have been used in the Data Selection Strategies publication. This functionality is currently implemented for the LibSVMClassifierNode and the SorSvmNode. It requires to replace the _complete_training

References

main source: Data Selection Strategies
author Krell, M. M. and Wilshusen, N. and Ignat, A. C., and Kim, S. K.
title Comparison of Data Selection Strategies For Online Support Vector Machine Classification
book Proceedings of the International Congress on Neurotechnology, Electronics and Informatics
publisher SciTePress
year 2015
doi 10.5220/0005650700590067

Parameters

class_labels:

Sets the labels of the classes. This can be done automatically, but setting it will be better, if you want to have similar predictions values for classifiers trained on different sets. Otherwise this variable is built up by occurrence of labels. Furthermore the important class (ir_class) should get the second position in the list, such that it gets higher prediction values by the classifier.

(recommended, default: [])

complexity:

Complexity sets the weighting of punishment for misclassification in comparison to generalizing classification from the data. Value in the range from 0 to infinity.

(optional, default: 1)

weight:

Defines an array with two entries to give different complexity weight on the two used classes. Set the parameter C of class i to weight*C.

(optional, default: [1,1])

kernel_type:

Defines the used kernel function. One of the following Strings: ‘LINEAR’, ‘POLY’,’RBF’, ‘SIGMOID’.

  • LINEAR

    u'*v
    
  • POLY

    (gamma*u'*v + offset)^exponent
    
  • RBF

    exp(-gamma*|u-v|^2)
    
  • SIGMOID

    tanh(gamma*u'*v + offset)
    

(optional, default: ‘LINEAR’)

exponent:

Defines parameter for the ‘POLY’-kernel. Equals parameter /degree/ in libsvm-package.

(optional, default: 2)

gamma:

Defines parameter for ‘POLY’-,’RBF’- and ‘SIGMOID’-kernel. In libsvm-package it was set to 1/num_features.

For RBF-Kernels we calculate it as described in:

Paper:

A practical Approach to Model Selection for Support vector Machines with a Gaussian Kernel

Author:
  1. Varewyck and J.-P. Martens.
Formula:

15

The quasi-optimal complexity should then be found in [0.5,2,8] or better to say log_2 C should be found in [-1,1,3]. For testing a wider range, you may try: [-2,...,4]. A less accurate version would be to use 1/(num_features*sqrt(2)).

For the other kernels we set it to 1/num_features.

Warning

For the RBF-Parameter selection the the HistogramFeatureNormalizationNode should be used before.

(optional, default: None)

offset:

Defines parameter for ‘POLY’- and ‘SIGMOID’-kernel. Equals parameter /coef0/ in libsvm-package.

(optional, default: 0)

nu:

Defines parameter for ‘nu-SVC’, ‘one-class SVM’ and ‘nu-SVR’. It approximates the fraction of training errors and support vectors. Value in the range from 0 to 1.

(optional, default: 0.5)

epsilon:

Defines parameter for ‘epsilon-SVR’. Set the epsilon in loss function of epsilon-SVR. Equals parameter /p/ in libsvm-package.

(optional, default: 0.1)

tolerance:

tolerance of termination criterion, same default as in libsvm.

In the SOR implementation the tolerance may be reduced to one tenth of the complexity, if it is higher than this value. Otherwise it would be no valid stopping criterion.

(optional, default: 0.001)

max_time:

Time for the construction of the classifier For LibSVM we restrict the number of steps but for cvxopt we use a signal handling to stop processes. This may happen, when the parameters are bad chosen or the problem matrix is to large.

Parameter is still in testing and implementation phase.

The time is given in seconds and as a default, one hour is used.

(optional, default: 3600)

keep_vectors:

After training the training data is normally deleted, except this variable is set to True.

(optional, default: False)

use_list:

Switch to store samples as list. If set to False they are stored as arrays. Used for compatibility with LIBSVM. This parameter should not be changed by the user.

(optional, default False)

multinomial:

Accept more than two classes.

(optional, default: False)

add_type:

In case the classifier should be retrained, this parameter specifies which incoming samples should be added to the training set. One of the following strings ‘ADD_ALL’, ‘ONLY_MISSCLASSIFIED’, ‘ONLY_WITHIN_MARGIN’, ‘UNSUPERVISED_PROB’.

  • ADD_ALL

    Add all incoming samples.

  • ONLY_MISSCLASSIFIED

    Add only those samples that were misclassified by the current decision function.

    References

    minor  
    author Bordes, Antoine and Ertekin, Seyda and Weston, Jason and Bottou, L{‘e}on
    title Fast Kernel Classifiers with Online and Active Learning
    journal
    1. Mach. Learn. Res.
    volume 6
    month dec
    year 2005
    issn 1532-4435
    pages 1579–1619
    numpages 41
    publisher JMLR.org
  • ONLY_WITHIN_MARGIN

    Add only samples that lie within the margin of the SVM.

    References

    main  
    author Bordes, Antoine and Ertekin, Seyda and Weston, Jason and Bottou, L{‘e}on
    title Fast Kernel Classifiers with Online and Active Learning
    journal
    1. Mach. Learn. Res.
    volume 6
    month dec
    year 2005
    issn 1532-4435
    pages 1579–1619
    numpages 41
    publisher JMLR.org
    main  
    author Oskoei, M.A. and Gan, J.Q. and Huosheng Hu
    booktitle Engineering in Medicine and Biology Society, 2009. EMBC 2009. Annual International Conference of the IEEE
    title Adaptive schemes applied to online SVM for BCI data classification
    year 2009
    month Sept
    pages 2600-2603
    ISSN 1557-170X
  • UNSUPERVISED_PROB

    Classify the label with the current decision function and determine how probable this decision is. If it is most likely right, which means the probability exceeds a threshold, add the sample to the training set.

    References

    main  
    author Sp{“u}ler, Martin and Rosenstiel, Wolfgang and Bogdan, Martin
    year 2012
    isbn 978-3-642-33268-5
    booktitle Artificial Neural Networks and Machine Learning - ICANN 2012
    volume 7552
    series Lecture Notes in Computer Science
    editor Villa, AlessandroE.P. and Duch, Wlodzislaw and ‘{E}rdi, P’{e}ter and Masulli, Francesco and Palm, G{“u}nther
    title Adaptive SVM-Based Classification Increases Performance of a MEG-Based Brain-Computer Interface (BCI)
    publisher Springer Berlin Heidelberg
    pages 669-676
    language English

(optional, default: “ADD_ALL”)

discard_type:

In case the classifier should be retrained this parameter specifies which samples from the training set should be discarded to keep the training set small. One of the following strings ‘REMOVE_OLDEST’, ‘REMOVE_FARTHEST’, ‘REMOVE_NO_BORDER_POINTS’, ‘INC’, ‘INC_BATCH’, ‘CDT’, ‘DONT_DISCARD’.

  • REMOVE_OLDEST

    Remove the oldest sample from the training set.

    References

    main  
    title Online weighted LS-SVM for hysteretic structural system identification
    journal Engineering Structures
    volume 28
    number 12
    pages 1728 - 1735
    year 2006
    issn 0141-0296
    author He-Sheng Tang and Song-Tao Xue and Rong Chen and Tadanobu Sato
    minor  
    author Van Vaerenbergh, S. and Via, J. and Santamaria, I.
    booktitle Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
    title A Sliding-Window Kernel RLS Algorithm and Its Application to Nonlinear Channel Identification
    year 2006
    month May
    volume 5
    ISSN 1520-6149
    minor  
    author Funaya, Hiroyuki and Nomura, Yoshihiko and Ikeda, Kazushi
    booktitle ICONIP (1)
    date 2009-10-26
    editor K{“o}ppen, Mario and Kasabov, Nikola K. and Coghill, George G.
    isbn 978-3-642-02489-4
    keywords dblp
    pages 929-936
    publisher Springer
    series Lecture Notes in Computer Science
    title A Support Vector Machine with Forgetting Factor and Its Statistical Properties.
    volume 5506
    year 2008
    minor  
    title On-Line One-Class Support Vector Machines. An Application to Signal Segmentation
    author Gretton, A and Desobry, F
    year 2003
    date 2003-04
    journal IEEE ICASSP Vol. 2
    pages 709–712
  • INC

    Don’t remove any sample, but retrain the SVM/classifier incrementally with each incoming sample.

    References

    main  
    year 2012
    isbn 978-3-642-34155-7
    booktitle Advances in Intelligent Data Analysis XI
    volume 7619
    series Lecture Notes in Computer Science
    editor Hollm’{e}n, Jaakko and Klawonn, Frank and Tucker, Allan
    title Batch-Incremental versus Instance-Incremental Learning in Dynamic and Evolving Data
    publisher Springer Berlin Heidelberg
    author Read, Jesse and Bifet, Albert and Pfahringer, Bernhard and Holmes, Geoff
    pages 313-323
  • CDT

    Detect changes in the distribution of the data and adapt the classifier accordingly, by throwing old samples away and only take the last few for retraining.

    References

    main  
    author Alippi, C. and Derong Liu and Dongbin Zhao and Li Bu
    journal Systems, Man, and Cybernetics: Systems, IEEE Transactions on
    title Detecting and Reacting to Changes in Sensing Units: The Active Classifier Case
    year 2014
    month March
    volume 44
    number 3
    pages 353-362
    ISSN 2168-2216
    minor  
    title Intelligence for embedded systems: a methodological approach
    author Cesare Alippi
    publisher Springer
    address Cham [u.a.]
    year 2014
    ISBN 978-3-319-05278-6
    pages 211-247
    chapter Learning in Nonstationary and Evolving Environments
  • INC_BATCH

    Collect new samples until a basket size is reached. Then throw all old samples away. And retrain the classifier with the current training set.

    References

    main  
    year 2012
    isbn 978-3-642-34155-7
    booktitle Advances in Intelligent Data Analysis XI
    volume 7619
    series Lecture Notes in Computer Science
    editor Hollm’{e}n, Jaakko and Klawonn, Frank and Tucker, Allan
    title Batch-Incremental versus Instance-Incremental Learning in Dynamic and Evolving Data
    publisher Springer Berlin Heidelberg
    author Read, Jesse and Bifet, Albert and Pfahringer,Bernhard and Holmes, Geoff
    pages 313-323
  • DONT_DISCARD

    Don’t remove any samples from the training set.

  • REMOVE_FARTHEST

    Remove that sample that is farthest away from the hyperplane.

  • REMOVE_NO_BORDER_POINTS

    Remove all points that are not in the border of their class.

    References

    main  
    title Incremental SVM based on reserved set for network intrusion detection
    journal Expert Systems with Applications
    volume 38
    number 6
    pages 7698 - 7707
    year 2011
    issn 0957-4174
    author Yang Yi and Jiansheng Wu and Wei Xu

(optional, default: “REMOVE_OLDEST”)

keep_only_sv:

Because only the support vectors determine the decision function remove all other samples after the SVM is trained.

(optional, default: False)

basket_size:

Specify the number of training samples for retraining.

(optional, default: infinity)

relabel:

Relabel the training set after the SVM is trained. If the parameter is set to True, the relabeling is done once. Otherwise, if the parameter is set to conv relabeling is repeated till convergence (with a maximum of 10 iterations over the complete training data to ensure stopping). The maximum number of iterations is reset after each relabeling.

(optional, default: False)

border_handling:
 

Specify how to determine border points in case the discard_type: ‘REMOVE_ONLY_BORDER_POINTS’ is selected. One of the following strings ‘USE_ONLY_BORDER_POINTS’, ‘USE_DIFFERENCE’.

  • USE_ONLY_BORDER_POINTS

    Keep only those points which distance to the center lie within a specified range.

  • USE_DIFFERENCE

    Use the difference from the center of the class as criterion to determine the border points of the class.

(optional, default: USE_ONLY_BORDER_POINTS)

scale_factor_small:
 

Factor to specify the distance of the inner border to the center of a class.

This should be smaller than scale_factor_tall.

inner border = scale_factor_small * distance between centers

(optional, default: 0.3)

scale_factor_tall:
 

Factor to specify the distance of the outer border to the center of a class.

This should be greater than scale_factor_small.

outer border = scale_factor_tall * distance between centers

(optional, default: 0.5)

p_threshold:

Probability threshold for unsupervised learning. Only data that is most likely right (p>p_threshold) classified will be added to training set.

(optional, default: 0.8)

cdt_threshold:

Specify a multiple of the amount of support vectors before the SVM should be retrained anyway, does not matter if something changed or not.

(optional, default: 10)

training_set_ratio:
 

Handle the ratio of the classes. One of the following strings: “DONT_HANDLE_RATIO”, “KEEP_RATIO_AS_IT_IS”, “BALANCED_RATIO”

  • DONT_HANDLE_RATIO

    Dont handle the ratio between the classes and dont consider the class labels of the samples.

  • KEEP_RATIO_AS_IT_IS

    Dont change the ratio between the classes. If a sample from one class is added an other sample from the same class will be removed from the training set.

  • BALANCED_RATIO

    Try to keep a balanced training set with just as many positive samples as negatives.

(optional, default: DONT_HANDLE_RATIO”)

u_retrain:

For the retraining, not the given label is used but it is replaced with the prediction of the current classifier. This option is interesting, where no true label can be provided and a fake label is used instead. It is related to the parameter p_threshold and the relabel parameter. The latter allows for a correction of the possibly wrong label and the first avoids to use to unsure predictions

The retrain parameter has to be additionally set to True for this parameter to become really active.

(optional, default: False)

show_plot:

Plot the samples and the decision function.

(optional, default: False)

save_plot:

Save the plot of the samples and the decision function.

(optional, default: False)

plot_storage:

Specify a directory to store the images of the plots. If directory does not exists, it will be created.

(optional, default: ”./plot_storage”)

Note

Not all parameter effects are implemented for all inheriting nodes. Kernels are available for LibSVMClassifierNode and partially for other nodes. The tolerance has only an effect on Liblinear, LibSVM and SOR classifier.

Input:FeatureVector
Output:PredictionVector
Author:Mario Krell (mario.krell@dfki.de)
Created:2012/03/28

Class Components Summary

_RegularizedClassifierBase__intersect(rect, line) Calculate the points of a line in a given rectangle
__getstate__() Return a pickable state for this object
__hyperparameters
__setstate__(sdict) Restore object from its pickled state
_batch_retrain(data_list, label_list) Simply adding the new data to the old one an retraining
_execute(x) Executes the classifier on the given data vector in the linear case
_inc_train(data[, class_label]) Manipulation of training set for updating the svm
_train(data, class_label) Add a new sample with associated label to the training set.
_train_sample(data, class_label) Train the classifier on the given data sample
adapt_training_set(data[, class_label]) Select the samples that should belong to the training set and retrain the classifier.
add_new_sample(data[, class_label, default]) Add a new sample to the training set
append_sample(sample) Some methods need a list of arrays as lists and some prefer arrays
approximate_AB_for_plat_prob(deci, label, ...) Approximate the distribution of both classes
calculate_gamma() Calculate default gamma
change_detection_test(data[, class_label]) Detect a change of the distribution
circle_out(x, y[, s]) Circle out points with size ‘s’.
delete_training_data() Check if training data can be deleted to save memory
discard_data(data, class_label, ...[, label]) Discard data from training set according to some criteria.
get_own_transformation([sample]) Use classification function e.g.
get_platt_prob(deci, label, prior1, prior0, data) Get a probability for the decision of the svm
get_sensor_ranking() Transform the classification vector to a sensor ranking
incremental_training(data, class_label) Warm Start Implementation by Mario Michael Krell
is_potential_support_vector(data[, class_label]) Check whether the given data could become a support vector
is_supervised() Returns whether this node requires supervised training
is_trainable() Returns whether this node is trainable
kernel_func(u, v) Returns the kernel function applied on x and y
learn_CDT() Learn features of the training set to detect changes in the
plot_class_borders(mStandard, mTarget, R, ...) Plot the borders of each class.
plot_data(x, y, target[, s]) Plot points with size ‘s’
plot_data_3D(x, y, z, target[, s]) Plot points with size ‘s’
plot_hyperplane() Plot the hyperplane (in 2D a line).
plot_hyperplane_3D() Plot the hyperplane (in 3D a surface).
plot_line(coef, \*args, \*\*kwargs) Plot a line (y=a*x+b or a*x+b*y+c=0) with the given coefficients
plot_samples() Plot all training samples.
plot_samples_3D() Plot all training samples.
plot_support_vectors() Mark the support vectors by a circle.
print_variables() Debug function for printing the classifier and the slack variables
relabel_training_set() Relabel the training set according to the current decision function.
remove_no_border_points(retraining_required) Discard method to remove all samples from the training set that are not in the border of their class.
remove_non_support_vectors() Remove all samples that are no support vectors
remove_samples(idxs) Remove the samples at the given indices from the training set
retrain(data, class_label, ...) Start retraining procedure if the training set changed.
retrain_SVM() Retrain the svm with the current training set
select_new_data(data, class_label) Add the new sample to the training set if it satisfies some criteria.
stop_training() Wrapper around stop training for measuring times
store_state(result_dir[, index]) Stores this node in the given directory result_dir
train(data, label) Special mapping for multi-class classification
visualize() Show the training samples, the support vectors if possible and the
__init__(regression=False, complexity=1, weight=None, kernel_type='LINEAR', exponent=2, gamma=None, offset=0, nu=0.5, epsilon=0.1, class_labels=None, debug=False, max_time=3600, tolerance=0.001, complexities_path=None, keep_vectors=False, use_list=False, multinomial=False, add_type='ADD_ALL', discard_type='REMOVE_OLDEST', keep_only_sv=False, basket_size=inf, relabel=False, border_handling='USE_ONLY_BORDER_POINTS', scale_factor_small=0.3, scale_factor_tall=0.5, p_threshold=0.8, show_plot=False, save_plot=False, cdt_threshold=10, u_retrain=False, training_set_ratio='DONT_HANDLE_RATIO', plot_storage='./plot_storage', ratio=0.5, **kwargs)[source]
stop_training()[source]

Wrapper around stop training for measuring times

is_trainable()[source]

Returns whether this node is trainable

is_supervised()[source]

Returns whether this node requires supervised training

delete_training_data()[source]

Check if training data can be deleted to save memory

__getstate__()[source]

Return a pickable state for this object

store_state(result_dir, index=None)[source]

Stores this node in the given directory result_dir

__setstate__(sdict)[source]

Restore object from its pickled state

get_sensor_ranking()[source]

Transform the classification vector to a sensor ranking

This method will fail, if the classification vector variable self.features is not existing. This is for example the case when using nonlinear classification with kernels.

_train(data, class_label)[source]

Add a new sample with associated label to the training set.

In case of neither incremental learning nor the restriction of training samples is used, add the samples to the training set. Otherwise check whether the classifier is already trained and if so select an appropriate training set and retrain the classifier. If the classifier is not trained, train it when there are enough samples available.

Parameters:
  • data (list of float) – A new sample for the training set.
  • class_label (str) – The label of the new sample.
_train_sample(data, class_label)[source]

Train the classifier on the given data sample

It is assumed that the class_label parameter contains information about the true class the data belongs to.

Parameters:
  • data (FeatureVector) – A new sample for the training set.
  • class_label (str.) – The label of the new sample.
train(data, label)[source]

Special mapping for multi-class classification

It enables label filtering for one vs. REST and one vs. one case. Furthermore, the method measures time for the training segments.

append_sample(sample)[source]

Some methods need a list of arrays as lists and some prefer arrays

_execute(x)[source]

Executes the classifier on the given data vector in the linear case

prediction value = <w,data>+b

get_own_transformation(sample=None)[source]

Use classification function e.g. for visualization in LINEAR case

_inc_train(data, class_label=None)[source]

Manipulation of training set for updating the svm

_batch_retrain(data_list, label_list)[source]

Simply adding the new data to the old one an retraining

print_variables()[source]

Debug function for printing the classifier and the slack variables

kernel_func(u, v)[source]

Returns the kernel function applied on x and y

  • POLY

    (gamma*u'*v + offset)^exponent
    
  • RBF

    exp(-gamma*|u-v|^2)
    
  • SIGMOID

    tanh(gamma*u'*v + offset)
    
calculate_gamma()[source]

Calculate default gamma

This defines a parameter for ‘POLY’-,’RBF’- and ‘SIGMOID’-kernel. We calculate the parameter gamma as described in the base node description.

adapt_training_set(data, class_label=None)[source]

Select the samples that should belong to the training set and retrain the classifier.

For incremental training run through four steps.

  1. Add samples to the training set according to some criteria.
  2. Discard samples from the training set according to some criteria.
  3. Retrain the classifier with the current training set.
  4. If used relabel the training set according to the current decision function.
Parameters:
  • data (list of float) – A new sample for the training set.
  • class_label (str) – The label of the new sample.
select_new_data(data, class_label)[source]

Add the new sample to the training set if it satisfies some criteria.

Parameters:
  • data (list of float) – A new sample for the training set.
  • class_label (str) – The label of the new sample.
Return type:

[flag if new data is in training set, flag if retraining is required (the new point is a potential sv or a removed one was a sv)]

discard_data(data, class_label, new_data_in_training_set, retraining_required, label=None)[source]

Discard data from training set according to some criteria.

Parameters:
  • data (list of float) – A new sample for the training set.
  • class_label (str) – The label of the new sample.
  • new_data_in_training_set (bool) – flag if new data is in training set
  • retraining_required (bool) – flag if retraining is requiered (the new point is a potentiell sv or a removed one was a sv)
Return type:

[flag if new data is in training set, flag if retraining is requiered (the new point is a potentiell sv or a removed one was a sv)]

retrain(data, class_label, new_data_in_training_set, retraining_required)[source]

Start retraining procedure if the training set changed.

Parameters:
  • data (list of float) – A new sample for the training set.
  • class_label (str) – The label of the new sample.
  • new_data_in_training_set (bool) – flag if new data is in training set
  • retraining_required – flag if retraining is required (the new point is a potential sv or a removed one was a sv)
relabel_training_set()[source]

Relabel the training set according to the current decision function.

is_potential_support_vector(data, class_label=None)[source]

Check whether the given data could become a support vector

This is when the data is within, on or on the other side of the margin.

Parameters:
  • data (list of float) – A new sample for the training set.
  • class_label (str) – The label of the new sample.
remove_no_border_points(retraining_required)[source]

Discard method to remove all samples from the training set that are not in the border of their class.

The border is determined by a minimum distance from the center of the class and a maximum distance.

Parameters:retraining_required – flag if retraining is required (the new point is a potential sv or a removed one was a sv)
add_new_sample(data, class_label=None, default=False)[source]

Add a new sample to the training set

Parameters:
  • data (list of float) – A new sample for the training set.
  • class_label (str) – The label of the new sample.
  • default – Specifies if the sample is added to the current training set or to a future training set
  • default – bool
remove_samples(idxs)[source]

Remove the samples at the given indices from the training set

Param:idxs: Indices of the samples to remove.
Type:idxs: list of int
Return type:bool - True if a support vector was removed.
remove_non_support_vectors()[source]

Remove all samples that are no support vectors

retrain_SVM()[source]

Retrain the svm with the current training set

incremental_training(data, class_label)[source]

Warm Start Implementation by Mario Michael Krell

The saved status of the algorithm, including the Matrix M, is used as a starting point for the iteration. Only the problem has to be lifted up one dimension.

learn_CDT()[source]

Learn features of the training set to detect changes in the underlying distribution

change_detection_test(data, class_label=None)[source]

Detect a change of the distribution

Parameters:
  • data (list of float) – A new sample for the training set.
  • class_label (str) – The label of the new sample.
Return type:

bool - If change detected return True

get_platt_prob(deci, label, prior1, prior0, data)[source]

Get a probability for the decision of the svm

Parameters:
  • deci (list of float) – List of decision made for each sample.
  • label (list of bool) – List of labels from the previous samples.
  • prior1 (int) – Number of samples of class 1
  • prior0 (int) – Number of samples of class 0
  • data (list of float) – Sample under investigation
Return type:

[float, int] - probability and the corresponding label

approximate_AB_for_plat_prob(deci, label, prior1, prior0)[source]

Approximate the distribution of both classes

Parameters:
  • deci (list of float) – List of decision made for each sample.
  • label (list of bool) – List of labels from the previous samples.
  • prior1 (int) – Number of samples of class 1
  • prior0 (int) – Number of samples of class 0
Return type:

[float, float] - ([A, B] - parameters of sigmoid)

plot_line(coef, *args, **kwargs)[source]

Plot a line (y=a*x+b or a*x+b*y+c=0) with the given coefficients

Parameters:coef (list of floats) – Coefficients determining the line
Return type:list of lines
circle_out(x, y, s=20, *args, **kwargs)[source]

Circle out points with size ‘s’.

Parameters:
  • x (list of float) – x coordinates.
  • y (list of float) – y coordinates.
  • s – Size of circle
Tyep s:

int

plot_data(x, y, target, s=20, *args, **kwargs)[source]

Plot points with size ‘s’

Parameters:
  • x (list of float) – x coordinates.
  • y (list of float) – y coordinates.
  • target (bool) – Determine class label.
  • s (int) – Size of point.
plot_hyperplane()[source]

Plot the hyperplane (in 2D a line).

plot_samples()[source]

Plot all training samples.

Plot all training samples and mark the class association.

plot_support_vectors()[source]

Mark the support vectors by a circle.

plot_class_borders(mStandard, mTarget, R, scaleFactorSmall, scaleFactorTall)[source]

Plot the borders of each class.

Parameters:
  • mStandard ([float, float] - (x,y)) – Center of standard class.
  • mTarget ([float, float] - (x,y)) – Center of target class.
  • R (float) – Distance between both centers.
  • scaleFactorSmall (float) – Determine inner circle of class border.
  • scaleFactorTall (float) – Determine outer circle of class border.
plot_data_3D(x, y, z, target, s=20, *args, **kwargs)[source]

Plot points with size ‘s’

Parameters:
  • x (list of float) – x coordinates.
  • y (list of float) – y coordinates.
  • z (list of float) – z coordinates:
  • target (bool) – Determine class label.
  • s (int) – Size of point.
plot_samples_3D()[source]

Plot all training samples.

Plot all training samples and mark the class association.

plot_hyperplane_3D()[source]

Plot the hyperplane (in 3D a surface).

visualize()[source]

Show the training samples, the support vectors if possible and the current decision function

_RegularizedClassifierBase__intersect(rect, line)

Calculate the points of a line in a given rectangle

Parameters:
  • rect (list of float) – Parameters of a rectangle (min x, min y, max x, max y).
  • line (list of float) – line given as y=a*x+b or a*x+b*y+c=0
Return type:

list of pairs of float

__hyperparameters = set([ChoiceParameter<kernel_type>, NormalParameter<ratio>, NoOptimizationParameter<kwargs_warning>, NoOptimizationParameter<dtype>, NoOptimizationParameter<output_dim>, NoOptimizationParameter<use_list>, LogNormalParameter<epsilon>, BooleanParameter<regression>, NoOptimizationParameter<retrain>, LogUniformParameter<complexity>, NoOptimizationParameter<store>, NoOptimizationParameter<input_dim>, QNormalParameter<offset>, NoOptimizationParameter<debug>, QUniformParameter<max_time>, LogNormalParameter<tolerance>, UniformParameter<nu>, NoOptimizationParameter<keep_vectors>])

TimeoutException

class pySPACE.missions.nodes.classification.base.TimeoutException[source]

Bases: exceptions.Exception

Break up for to long simplex iterations

__weakref__

list of weak references to the object (if defined)