Boosting Algorithms as Gradient Descent


Much recent attention, both experimental and theoretical, has been focussed on classication algorithms which produce voted combinations of classi ers. Recent theoretical work has shown that the impressive generalization performance of algorithms like AdaBoost can be attributed to the classi er having large margins on the training data. We present an abstract algorithm for nding linear combinations of functions that minimize arbitrary cost functionals (i.e functionals that do not necessarily depend on the margin). Many existing voting methods can be shown to be special cases of this abstract algorithm. Then, following previous theoretical results bounding the generalization performance of convex combinations of classi ers in terms of general cost functions of the margin, we present a new algorithm (DOOM II) for performing a gradient descent optimization of such cost functions. Experiments on several data sets from the UC Irvine repository demonstrate that DOOM II generally outperforms AdaBoost, especially in high noise situations. Margin distribution plots verify that DOOM II is willing to `give up' on examples that are too hard in order to avoid over tting. We also show that the over tting behavior exhibited by AdaBoost can be quanti ed in terms of our proposed cost function.

Extracted Key Phrases

Citations per Year

514 Citations

Semantic Scholar estimates that this publication has 514 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Mason1999BoostingAA, title={Boosting Algorithms as Gradient Descent}, author={Llew Mason and Jonathan Baxter and Peter L. Bartlett and Marcus R. Frean}, booktitle={NIPS}, year={1999} }