Boosting the margin: A new explanation for the effectiveness of voting methods

@inproceedings{Schapire1997BoostingTM,
  title={Boosting the margin: A new explanation for the effectiveness of voting methods},
  author={Robert E. Schapire and Yoav Freund and Peter Barlett and Wee Sun Lee},
  booktitle={International Conference on Machine Learning},
  year={1997}
}
One of the surprising recurring phenomena observed in experiments with boosting is that the test error of the generated classifier usually does not increase as its size becomes very large, and often is observed to decrease even after the training error reaches zero. In this paper, we show that this phenomenon is related to the distribution of margins of the training examples with respect to the generated voting classification rule, where the margin of an example is simply the difference between… 

Figures and Tables from this paper

Further results on the margin explanation of boosting: new algorithm and experiments

An efficient algorithm is developed that, given a boosting classifier, learns a new voting classifier which usually has a smaller Emargin bound, and finds that the new classifier often has smaller test errors, which agrees with what the EmargIn theory predicts.

How boosting the margin can also boost classifier complexity

A close look at Breiman's compelling but puzzling results finds that the poorer performance of arc-gv can be explained by the increased complexity of the base classifiers it uses, an explanation supported by experiments and entirely consistent with the margins theory.

On the Insufficiency of the Large Margins Theory in Explaining the Performance of Ensemble Methods

The large margins theory is shown to be not sufficient for explaining the performance of voting classifiers by illustrating how it is possible to improve upon the margin distribution of an ensemble solution, while keeping the complexity fixed, yet not improve the test set performance.

Boosting in the Limit: Maximizing the Margin of Learned Ensembles

The crucial question as to why boosting works so well in practice, and how to further improve upon it, remains mostly open, and it is concluded that no simple version of the minimum-margin story can be complete.

The role of margins in boosting and ensemble performance

The role of margins is examined in boosting and ensemble method performance, which can be very robust to overfitting, in most instances having lower generalization error than other competing ensemble methodologies, such as bagging and random forests.

Maximizing the Margin with Boosting

An iterative version of AdaBoost is given that explicitly maximizes the minimum margin of the examples and the number of hypotheses used in the final linear combination which approximates the maximum margin hyperplane with a certain precision.

Further results on the margin distribution

It is shown that in the linear case the approach can be viewed as a change of kernel and that the algorithms arising from the approach are exactly those originally proposed by Cortes and Vapnik.

On the Margin Explanation of Boosting Algorithms

A bound in terms of a new margin measure called Equilibrium margin (Emargin) is proved, which suggests that the minimum margin is not crucial for the generalization error and shows that a large Emargin implies good generalization.

Supervised projection approach for boosting classifiers

Boosting Through Optimization of Margin Distributions

A new boosting algorithm is designed, termed margin-distribution boosting (MDBoost), which directly maximizes the average margin and minimizes the margin variance at the same time, and a totally corrective optimization algorithm based on column generation is proposed to implement MDBoost.
...

References

SHOWING 1-10 OF 40 REFERENCES

Boosting in the Limit: Maximizing the Margin of Learned Ensembles

The crucial question as to why boosting works so well in practice, and how to further improve upon it, remains mostly open, and it is concluded that no simple version of the minimum-margin story can be complete.

Experiments with a New Boosting Algorithm

This paper describes experiments carried out to assess how well AdaBoost with and without pseudo-loss, performs on real learning problems and compared boosting to Breiman's "bagging" method when used to aggregate various classifiers.

An Empirical Evaluation of Bagging and Boosting

The results clearly show that even though Bagging almost always produces a better classifier than any of its individual component classifiers and is relatively impervious to overfitting, it does not generalize any better than a baseline neural-network ensemble method.

Bagging, Boosting, and C4.5

Results of applying Breiman's bagging and Freund and Schapire's boosting to a system that learns decision trees and testing on a representative collection of datasets show boosting shows the greater benefit.

Improved Boosting Algorithms Using Confidence-rated Predictions

We describe several improvements to Freund and Schapire's AdaBoost boosting algorithm, particularly in a setting in which hypotheses may assign confidences to each of their predictions. We give a

For Valid Generalization the Size of the Weights is More Important than the Size of the Network

This paper shows that if a large neural network is used for a pattern classification problem, and the learning algorithm finds a network with small weights that has small squared error on the

Training Methods for Adaptive Boosting of Neural Networks

This paper uses AdaBoost to improve the performances of neural networks and compares training methods based on sampling the training set and weighting the cost function.

Boosting Decision Trees

A constructive, incremental learning system for regression problems that models data by means of locally linear experts that does not compete for data during learning and derives asymptotic results for this method.

A training algorithm for optimal margin classifiers

A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented. The technique is applicable to a wide variety of the classification functions,

Arcing Classifiers

Two arcing algorithms are explored, they are compared to each other and to bagging, and the definitions of bias and variance for a classifier as components of the test set error are introduced.