# A training algorithm for optimal margin classifiers

@inproceedings{Boser1992ATA, title={A training algorithm for optimal margin classifiers}, author={Bernhard E. Boser and Isabelle Guyon and Vladimir Naumovich Vapnik}, booktitle={Annual Conference Computational Learning Theory}, year={1992} }

A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented. The technique is applicable to a wide variety of the classification functions, including Perceptrons, polynomials, and Radial Basis Functions. The effective number of parameters is adjusted automatically to match the complexity of the problem. The solution is expressed as a linear combination of supporting patterns. These are the subset of training patterns that are closest to…

## 11,303 Citations

### Adaptive training methods for optimal margin classification

- Computer ScienceIJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339)
- 1999

This study considers adaptive training schemes for optimal margin classification with neural networks and describes some novel schemes and compares them with the conventional schemes.

### Automatic Capacity Tuning of Very Large VC-Dimension Classifiers

- Computer ScienceNIPS
- 1992

It is shown that even high-order polynomial classifiers in high dimensional spaces can be trained with a small amount of training data and yet generalize better than classifiers with a smaller VC-dimension.

### Pattern Selection for Support Vector Classifiers

- Computer ScienceIDEAL
- 2002

A k-nearest neighbors (k-NN) based pattern selection method that tries to select the patterns that are near the decision boundary and that are correctly labeled to reduce training time of redundant SVs.

### Pattern recognition with novel support vector machine learning method

- Computer Science2000 10th European Signal Processing Conference
- 2000

This study investigates the basic SVM method and points out some problems that may arise especially in large scale problems with abundant data, and proposes a novel SVM type method that aims to avoid the problems found in the basic method.

### New support vector algorithms with parametric insensitive/margin model

- Computer ScienceNeural Networks
- 2010

### Training Data Selection for Support Vector Machines

- Computer ScienceICNC
- 2005

This paper proposes two new methods that select a subset of data for SVM training and shows that a significant amount of training data can be removed by the proposed methods without degrading the performance of the resulting SVM classifiers.

### Fast Pattern Selection for Support Vector Classifiers

- Computer SciencePAKDD
- 2003

A k-nearest neighbors (k-NN) based pattern selection method that tries to select the patterns that are near the decision boundary and that are correctly labeled to reduce training time of redundant SVs.

### Perceptron-like large margin classifiers

- Computer Science
- 2005

As the data are embedded in the augmented space at a larger distance from the origin the maximum margin in that space approaches the maximum geometric one in the original space, and the algorithmic procedure could be regarded as an approximate maximal margin classifier.

### Selecting Data for Fast Support Vector Machines Training

- Computer ScienceTrends in Neural Computation
- 2007

This paper proposes two new methods that select a subset of data for SVM training and shows that a significant amount of training data can be removed by the proposed methods without degrading the performance of the resulting SVM classifiers.

### On the proliferation of support vectors in high dimensions

- Computer ScienceAISTATS
- 2021

This paper identifies new deterministic equivalences for this phenomenon of support vector proliferation, and uses them to substantially broaden the conditions under which the phenomenon occurs in high-dimensional settings, and proves a nearly matching converse result.

## References

SHOWING 1-10 OF 30 REFERENCES

### Structural Risk Minimization for Character Recognition

- Computer ScienceNIPS
- 1991

The method of Structural Risk Minimization is used to control the capacity of linear classifiers and improve generalization on the problem of handwritten digit recognition.

### Computer aided cleaning of large databases for character recognition

- Computer ScienceProceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems
- 1992

By using the method of pattern cleaning, combined with an emphasizing scheme applied on the patterns that are hard to learn, the error rate on the test set has been reduced by half, in the case of the database of handwritten lowercase characters entered on a touch terminal.

### Comparing different neural network architectures for classifying handwritten digits

- Computer ScienceInternational 1989 Joint Conference on Neural Networks
- 1989

The authors propose a novel way of organizing the network architectures by training several small networks so as to deal separately with subsets of the problem, and then combining the results.

### Regularization Algorithms for Learning That Are Equivalent to Multilayer Networks

- Computer ScienceScience
- 1990

A theory is reported that shows the equivalence between regularization and a class of three-layer networks called regularization networks or hyper basis functions.

### Consistent inference of probabilities in layered networks: predictions and generalizations

- Computer ScienceInternational 1989 Joint Conference on Neural Networks
- 1989

The problem of learning a general input-output relation using a layered neural network is discussed in a statistical framework and the authors arrive at a Gibbs distribution on a canonical ensemble of networks with the same architecture.

### Tangent Prop - A Formalism for Specifying Selected Invariances in an Adaptive Network

- Computer ScienceNIPS
- 1991

A scheme is implemented that allows a network to learn the derivative of its outputs with respect to distortion operators of their choosing, which not only reduces the learning time and the amount of training data, but also provides a powerful language for specifying what generalizations the authors wish the network to perform.

### What Size Net Gives Valid Generalization?

- Mathematics, Computer ScienceNeural Computation
- 1989

It is shown that if m O(W/ ∊ log N/∊) random examples can be loaded on a feedforward network of linear threshold functions with N nodes and W weights, so that at least a fraction 1 ∊/2 of the examples are correctly classified, then one has confidence approaching certainty that the network will correctly classify a fraction 2 ∊ of future test examples drawn from the same distribution.

### Predicting {0,1}-functions on randomly drawn points

- Computer ScienceCOLT '88
- 1988

This model is related to Valiant′s PAC learning model, but does not require the hypotheses used for prediction to be represented in any specified form and shows how to construct prediction strategies that are optimal to within a constant factor for any reasonable class F of target functions.

### Neural Networks and the Bias/Variance Dilemma

- Computer Science, PsychologyNeural Computation
- 1992

It is suggested that current-generation feedforward neural networks are largely inadequate for difficult problems in machine perception and machine learning, regardless of parallel-versus-serial hardware or other implementation issues.

### Fast Learning in Networks of Locally-Tuned Processing Units

- Computer ScienceNeural Computation
- 1989

We propose a network architecture which uses a single internal layer of locally-tuned processing units to learn both classification tasks and real-valued function approximations (Moody and Darken…