How boosting the margin can also boost classifier complexity

  title={How boosting the margin can also boost classifier complexity},
  author={L. Reyzin and R. Schapire},
  journal={Proceedings of the 23rd international conference on Machine learning},
  • L. Reyzin, R. Schapire
  • Published 2006
  • Mathematics, Computer Science
  • Proceedings of the 23rd international conference on Machine learning
Boosting methods are known not to usually overfit training data even as the size of the generated classifiers becomes large. Schapire et al. attempted to explain this phenomenon in terms of the margins the classifier achieves on training examples. Later, however, Breiman cast serious doubt on this explanation by introducing a boosting algorithm, arc-gv, that can generate a higher margins distribution than AdaBoost and yet performs worse. In this paper, we take a close look at Breiman's… Expand
On the Insufficiency of the Large Margins Theory in Explaining the Performance of Ensemble Methods
The large margins theory is shown to be not sufficient for explaining the performance of voting classifiers by illustrating how it is possible to improve upon the margin distribution of an ensemble solution, while keeping the complexity fixed, yet not improve the test set performance. Expand
Boosting Through Optimization of Margin Distributions
A new boosting algorithm is designed, termed margin-distribution boosting (MDBoost), which directly maximizes the average margin and minimizes the margin variance at the same time, and a totally corrective optimization algorithm based on column generation is proposed to implement MDBoost. Expand
Margins are Insufficient for Explaining Gradient Boosting
This work demonstrates that the $k$'th margin bound is inadequate in explaining the performance of state-of-the-art gradient boosters and proves a stronger and more refined margin-based generalization bound for boosted classifiers that indeed succeeds in explainingThe performance of modern gradient boosters. Expand
On the Margin Explanation of Boosting Algorithms
A bound in terms of a new margin measure called Equilibrium margin (Emargin) is proved, which suggests that the minimum margin is not crucial for the generalization error and shows that a large Emargin implies good generalization. Expand
Further results on the margin explanation of boosting: new algorithm and experiments
An efficient algorithm is developed that, given a boosting classifier, learns a new voting classifier which usually has a smaller Emargin bound, and finds that the new classifier often has smaller test errors, which agrees with what the EmargIn theory predicts. Expand
Margin Distribution Controlled Boosting
This paper empirically demonstrate that AdaBoost is actually a MD controlled algorithm and its iteration number acts as a parameter controlling the distribution and the generalization performance of MCBoost evaluated on UCI benchmark datasets is validated better than those of AdaBoost, L2Boost, LPBoost, AdaBoost-CG and MDBoost. Expand
Optimal Minimal Margin Maximization with Boosting
A new algorithm refuting the conjecture that an optimal trade-off between number of hypotheses trained and the minimal margin over all training points is possible and a lower bound is proved which implies that the new algorithm is optimal. Expand
On the doubt about margin explanation of boosting
This paper defends the margin-based explanation against Breiman's doubts by proving a new generalization error bound that considers exactly the same factors as Schapire et al. (1998) but is sharper than Breiman@?s (1999) minimum margin bound. Expand
A Refined Margin Analysis for Boosting Algorithms via Equilibrium Margin
A refined analysis of the margin theory is made, which proves a bound in terms of a new margin measure called the Equilibrium margin (Emargin) which is uniformly sharper than Breiman's minimum margin bound. Expand
On the Current State of Research in Explaining Ensemble Performance Using Margins
Several techniques are proposed and evidence suggesting that the generalization error of a voting classifier might be reduced by increasing the mean and decreasing the variance of the margins is provided, suggesting the current state of research in explaining ensemble performance holds. Expand


Boosting in the Limit: Maximizing the Margin of Learned Ensembles
The crucial question as to why boosting works so well in practice, and how to further improve upon it, remains mostly open, and it is concluded that no simple version of the minimum-margin story can be complete. Expand
Boosting the margin: A new explanation for the effectiveness of voting methods
It is shown that techniques used in the analysis of Vapnik's support vector classifiers and of neural networks with small weights can be applied to voting methods to relate the margin distribution to the test error. Expand
Maximizing the Margin with Boosting
An iterative version of AdaBoost is given that explicitly maximizes the minimum margin of the examples and the number of hypotheses used in the final linear combination which approximates the maximum margin hyperplane with a certain precision. Expand
Boosting Based on a Smooth Margin
This work introduces a smooth approximation of the margin that one can maximize in order to produce a maximum margin classifier for AdaBoost, and attempts to understand AdaBoost in terms of the authors' smooth margin. Expand
Bagging, Boosting, and C4.5
Results of applying Breiman's bagging and Freund and Schapire's boosting to a system that learns decision trees and testing on a representative collection of datasets show boosting shows the greater benefit. Expand
Improved Boosting Algorithms Using Confidence-rated Predictions
We describe several improvements to Freund and Schapire's AdaBoost boosting algorithm, particularly in a setting in which hypotheses may assign confidences to each of their predictions. We give aExpand
Empirical margin distributions and bounding the generalization error of combined classifiers
We prove new probabilistic upper bounds on generalization error of complex classifiers that are combinations of simple classifiers. Such combinations could be implemented by neural networks or byExpand
Boosting Decision Trees
A constructive, incremental learning system for regression problems that models data by means of locally linear experts that does not compete for data during learning and derives asymptotic results for this method. Expand
Arcing Classifiers
Recent work has shown that combining multiple versions of unstable classifiers such as trees or neural nets results in reduced test set error. One of the more effective is bagging (Breiman [1996a])Expand
An Introduction to Boosting and Leveraging
We provide an introduction to theoretical and practical aspects of Boosting and Ensemble learning, providing a useful reference for researchers in the field of Boosting as well as for those seekingExpand