Boosting the margin: A new explanation for the effectiveness of voting methods

@inproceedings{Schapire1997BoostingTM,
  title={Boosting the margin: A new explanation for the effectiveness of voting methods},
  author={Robert E. Schapire and Yoav Freund and Peter Barlett and Wee Sun Lee},
  booktitle={ICML},
  year={1997}
}
One of the surprising recurring phenomena observed in experiments with boosting is that the test error of the generated classifier usually does not increase as its size becomes very large, and often is observed to decrease even after the training error reaches zero. In this paper, we show that this phenomenon is related to the distribution of margins of the training examples with respect to the generated voting classification rule, where the margin of an example is simply the difference between… Expand
Further results on the margin explanation of boosting: new algorithm and experiments
TLDR
An efficient algorithm is developed that, given a boosting classifier, learns a new voting classifier which usually has a smaller Emargin bound, and finds that the new classifier often has smaller test errors, which agrees with what the EmargIn theory predicts. Expand
How boosting the margin can also boost classifier complexity
TLDR
A close look at Breiman's compelling but puzzling results finds that the poorer performance of arc-gv can be explained by the increased complexity of the base classifiers it uses, an explanation supported by experiments and entirely consistent with the margins theory. Expand
On the Insufficiency of the Large Margins Theory in Explaining the Performance of Ensemble Methods
TLDR
The large margins theory is shown to be not sufficient for explaining the performance of voting classifiers by illustrating how it is possible to improve upon the margin distribution of an ensemble solution, while keeping the complexity fixed, yet not improve the test set performance. Expand
Boosting in the Limit: Maximizing the Margin of Learned Ensembles
TLDR
The crucial question as to why boosting works so well in practice, and how to further improve upon it, remains mostly open, and it is concluded that no simple version of the minimum-margin story can be complete. Expand
The role of margins in boosting and ensemble performance
TLDR
The role of margins is examined in boosting and ensemble method performance, in most instances having lower generalization error than other competing ensemble methodologies, such as bagging and random forests. Expand
Maximizing the Margin with Boosting
TLDR
An iterative version of AdaBoost is given that explicitly maximizes the minimum margin of the examples and the number of hypotheses used in the final linear combination which approximates the maximum margin hyperplane with a certain precision. Expand
Further results on the margin distribution
TLDR
It is shown that in the linear case the approach can be viewed as a change of kernel and that the algorithms arising from the approach are exactly those originally proposed by Cortes and Vapnik. Expand
Analyzing Margins in Boosting
While the success of boosting or voting methods has been evident from experimental data [11], questions about why boosting does not overfit on training data remain. One idea about the effectivenessExpand
On the Margin Explanation of Boosting Algorithms
TLDR
A bound in terms of a new margin measure called Equilibrium margin (Emargin) is proved, which suggests that the minimum margin is not crucial for the generalization error and shows that a large Emargin implies good generalization. Expand
Supervised projection approach for boosting classifiers
TLDR
A new approach for boosting methods for the construction of ensembles of classifiers, based on using the distribution given by the weighting scheme of boosting to construct a non-linear supervised projection of the original variables, instead of using the weights of the instances to train the next classifier. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 51 REFERENCES
Boosting in the Limit: Maximizing the Margin of Learned Ensembles
TLDR
The crucial question as to why boosting works so well in practice, and how to further improve upon it, remains mostly open, and it is concluded that no simple version of the minimum-margin story can be complete. Expand
Experiments with a New Boosting Algorithm
TLDR
This paper describes experiments carried out to assess how well AdaBoost with and without pseudo-loss, performs on real learning problems and compared boosting to Breiman's "bagging" method when used to aggregate various classifiers. Expand
An Empirical Evaluation of Bagging and Boosting
TLDR
The results clearly show that even though Bagging almost always produces a better classifier than any of its individual component classifiers and is relatively impervious to overfitting, it does not generalize any better than a baseline neural-network ensemble method. Expand
Bagging, Boosting, and C4.5
TLDR
Results of applying Breiman's bagging and Freund and Schapire's boosting to a system that learns decision trees and testing on a representative collection of datasets show boosting shows the greater benefit. Expand
Structural Risk Minimization Over Data-Dependent Hierarchies
TLDR
A result is presented that allows one to trade off errors on the training sample against improved generalization performance, and a more general result in terms of "luckiness" functions, which provides a quite general way for exploiting serendipitous simplicity in observed data to obtain better prediction accuracy from small training sets. Expand
Improved Boosting Algorithms using Confidence-Rated Predictions
We describe several improvements to Freund and Schapire‘s AdaBoost boosting algorithm, particularly in a setting in which hypotheses may assign confidences to each of their predictions. We give aExpand
Improved Boosting Algorithms Using Confidence-rated Predictions
We describe several improvements to Freund and Schapire's AdaBoost boosting algorithm, particularly in a setting in which hypotheses may assign confidences to each of their predictions. We give aExpand
For Valid Generalization the Size of the Weights is More Important than the Size of the Network
This paper shows that if a large neural network is used for a pattern classification problem, and the learning algorithm finds a network with small weights that has small squared error on theExpand
Training Methods for Adaptive Boosting of Neural Networks
TLDR
This paper uses AdaBoost to improve the performances of neural networks and compares training methods based on sampling the training set and weighting the cost function. Expand
Boosting Decision Trees
TLDR
A constructive, incremental learning system for regression problems that models data by means of locally linear experts that does not compete for data during learning and derives asymptotic results for this method. Expand
...
1
2
3
4
5
...