# Boosting the margin: A new explanation for the effectiveness of voting methods

@inproceedings{Schapire1997BoostingTM, title={Boosting the margin: A new explanation for the effectiveness of voting methods}, author={Robert E. Schapire and Yoav Freund and Peter Barlett and Wee Sun Lee}, booktitle={ICML}, year={1997} }

One of the surprising recurring phenomena observed in experiments with boosting is that the test error of the generated classifier usually does not increase as its size becomes very large, and often is observed to decrease even after the training error reaches zero. In this paper, we show that this phenomenon is related to the distribution of margins of the training examples with respect to the generated voting classification rule, where the margin of an example is simply the difference between… Expand

#### 2,775 Citations

Further results on the margin explanation of boosting: new algorithm and experiments

- Mathematics, Computer Science
- Science China Information Sciences
- 2012

An efficient algorithm is developed that, given a boosting classifier, learns a new voting classifier which usually has a smaller Emargin bound, and finds that the new classifier often has smaller test errors, which agrees with what the EmargIn theory predicts. Expand

How boosting the margin can also boost classifier complexity

- Mathematics, Computer Science
- ICML
- 2006

A close look at Breiman's compelling but puzzling results finds that the poorer performance of arc-gv can be explained by the increased complexity of the base classifiers it uses, an explanation supported by experiments and entirely consistent with the margins theory. Expand

On the Insufficiency of the Large Margins Theory in Explaining the Performance of Ensemble Methods

- Computer Science, Mathematics
- ArXiv
- 2019

The large margins theory is shown to be not sufficient for explaining the performance of voting classifiers by illustrating how it is possible to improve upon the margin distribution of an ensemble solution, while keeping the complexity fixed, yet not improve the test set performance. Expand

Boosting in the Limit: Maximizing the Margin of Learned Ensembles

- Computer Science
- AAAI/IAAI
- 1998

The crucial question as to why boosting works so well in practice, and how to further improve upon it, remains mostly open, and it is concluded that no simple version of the minimum-margin story can be complete. Expand

The role of margins in boosting and ensemble performance

- Computer Science
- 2014

The role of margins is examined in boosting and ensemble method performance, in most instances having lower generalization error than other competing ensemble methodologies, such as bagging and random forests. Expand

Maximizing the Margin with Boosting

- Mathematics, Computer Science
- COLT
- 2002

An iterative version of AdaBoost is given that explicitly maximizes the minimum margin of the examples and the number of hypotheses used in the final linear combination which approximates the maximum margin hyperplane with a certain precision. Expand

Further results on the margin distribution

- Mathematics, Computer Science
- COLT '99
- 1999

It is shown that in the linear case the approach can be viewed as a change of kernel and that the algorithms arising from the approach are exactly those originally proposed by Cortes and Vapnik. Expand

Analyzing Margins in Boosting

- 2005

While the success of boosting or voting methods has been evident from experimental data [11], questions about why boosting does not overfit on training data remain. One idea about the effectiveness… Expand

On the Margin Explanation of Boosting Algorithms

- Mathematics, Computer Science
- COLT
- 2008

A bound in terms of a new margin measure called Equilibrium margin (Emargin) is proved, which suggests that the minimum margin is not crucial for the generalization error and shows that a large Emargin implies good generalization. Expand

Supervised projection approach for boosting classifiers

- Mathematics, Computer Science
- Pattern Recognit.
- 2009

A new approach for boosting methods for the construction of ensembles of classifiers, based on using the distribution given by the weighting scheme of boosting to construct a non-linear supervised projection of the original variables, instead of using the weights of the instances to train the next classifier. Expand

#### References

SHOWING 1-10 OF 51 REFERENCES

Boosting in the Limit: Maximizing the Margin of Learned Ensembles

- Computer Science
- AAAI/IAAI
- 1998

The crucial question as to why boosting works so well in practice, and how to further improve upon it, remains mostly open, and it is concluded that no simple version of the minimum-margin story can be complete. Expand

Experiments with a New Boosting Algorithm

- Computer Science
- ICML
- 1996

This paper describes experiments carried out to assess how well AdaBoost with and without pseudo-loss, performs on real learning problems and compared boosting to Breiman's "bagging" method when used to aggregate various classifiers. Expand

An Empirical Evaluation of Bagging and Boosting

- Computer Science
- AAAI/IAAI
- 1997

The results clearly show that even though Bagging almost always produces a better classifier than any of its individual component classifiers and is relatively impervious to overfitting, it does not generalize any better than a baseline neural-network ensemble method. Expand

Bagging, Boosting, and C4.5

- Computer Science
- AAAI/IAAI, Vol. 1
- 1996

Results of applying Breiman's bagging and Freund and Schapire's boosting to a system that learns decision trees and testing on a representative collection of datasets show boosting shows the greater benefit. Expand

Structural Risk Minimization Over Data-Dependent Hierarchies

- Computer Science
- IEEE Trans. Inf. Theory
- 1998

A result is presented that allows one to trade off errors on the training sample against improved generalization performance, and a more general result in terms of "luckiness" functions, which provides a quite general way for exploiting serendipitous simplicity in observed data to obtain better prediction accuracy from small training sets. Expand

Improved Boosting Algorithms using Confidence-Rated Predictions

- Mathematics, Computer Science
- COLT
- 1998

We describe several improvements to Freund and Schapire‘s AdaBoost boosting algorithm, particularly in a setting in which hypotheses may assign confidences to each of their predictions. We give a… Expand

Improved Boosting Algorithms Using Confidence-rated Predictions

- Computer Science
- COLT' 98
- 1998

We describe several improvements to Freund and Schapire's AdaBoost boosting algorithm, particularly in a setting in which hypotheses may assign confidences to each of their predictions. We give a… Expand

For Valid Generalization the Size of the Weights is More Important than the Size of the Network

- Mathematics, Computer Science
- NIPS
- 1996

This paper shows that if a large neural network is used for a pattern classification problem, and the learning algorithm finds a network with small weights that has small squared error on the… Expand

Training Methods for Adaptive Boosting of Neural Networks

- Computer Science
- NIPS
- 1997

This paper uses AdaBoost to improve the performances of neural networks and compares training methods based on sampling the training set and weighting the cost function. Expand

Boosting Decision Trees

- Computer Science
- NIPS
- 1995

A constructive, incremental learning system for regression problems that models data by means of locally linear experts that does not compete for data during learning and derives asymptotic results for this method. Expand