# Boosting Algorithms as Gradient Descent

@inproceedings{Mason1999BoostingAA, title={Boosting Algorithms as Gradient Descent}, author={Llew Mason and Jonathan Baxter and Peter L. Bartlett and Marcus Frean}, booktitle={NIPS}, year={1999} }

We provide an abstract characterization of boosting algorithms as gradient decsent on cost-functionals in an inner-product function space. We prove convergence of these functional-gradient-descent algorithms under quite weak conditions. Following previous theoretical results bounding the generalization performance of convex combinations of classifiers in terms of general cost functions of the margin, we present a new algorithm (DOOM II) for performing a gradient descent optimization of such… Expand

#### 663 Citations

Cost-sensitive boosting algorithms as gradient descent

- Computer Science
- 2008 IEEE International Conference on Acoustics, Speech and Signal Processing
- 2008

It is shown that several typical CSB methods can also be view as gradient descent for minimizing a unified objective function and deduce a general greedy boosting procedure. Expand

Historical Gradient Boosting Machine

- Computer Science
- GCAI
- 2018

The Historical Gradient Boosting Machine is introduced with the objective of improving the convergence speed of gradient boosting and incorporates both the accumulated previous gradients and the current gradient into the computation of descent direction in the function space. Expand

Accelerated gradient boosting

- Computer Science, Mathematics
- Machine Learning
- 2019

It is empirically shown that AGB is less sensitive to the shrinkage parameter and outputs predictors that are considerably more sparse in the number of trees, while retaining the exceptional performance of gradient boosting. Expand

Optimization by gradient boosting

- Mathematics, Computer Science
- ArXiv
- 2017

A thorough analysis of two widespread versions of gradient boosting is provided, and a general framework for studying these algorithms from the point of view of functional optimization is introduced. Expand

Robust Loss Functions for Boosting

- Mathematics, Computer Science
- Neural Computation
- 2007

Numerical experiments illustrate that the proposed loss functions derived from the contamination models are useful for handling highly noisy data in comparison with other loss functions. Expand

AdaBoost and Forward Stagewise Regression are First-Order Convex Optimization Methods

- Mathematics, Computer Science
- ArXiv
- 2013

This paper analyze two well-known boosting methods, AdaBoost and Incremental Forward Stagewise Regression, by establishing their precise connections to the Mirror Descent algorithm, which is a first-order method in convex optimization. Expand

Generalization Error and Algorithmic Convergence of Median Boosting

- Computer Science, Mathematics
- NIPS
- 2004

This paper extends theoretical results obtained for ADABOOST to median boosting and to its localized variant to show that the algorithm can converge to the maximum achievable margin within a preset precision in a finite number of steps. Expand

Gradient Boosting with Extreme Learning Machines for the Optimization of Nonlinear Functionals

- Computer Science
- 2019

This paper focuses on the gradient boosting technique combined with the ELM to address important instances of optimization problems such as optimal control of a complex system, multistage optimization and maximum likelihood estimation. Expand

A geometric approach to leveraging weak learners

- Computer Science, Mathematics
- Theor. Comput. Sci.
- 2002

A new leveraging algorithm is introduced based on a natural potential function for improving the hypotheses generated by weak learning algorithms and is likely to perform better than AdaBoost on noisy data and with weak learners returning low confidence hypotheses. Expand

Shrunken learning rates do not improve AdaBoost on benchmark datasets

- Computer Science
- 2001

Reduced learning rates cannot be recommended for use with boosted decision trees on datasets similar to these benchmark datasets and it is concluded that reduced learning rates provide no statistically significant improvement on these datasets. Expand

#### References

SHOWING 1-10 OF 21 REFERENCES

Greedy function approximation: A gradient boosting machine.

- Mathematics
- 2001

Function estimation/approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions… Expand

A Geometric Approach to Leveraging Weak Learners

- Computer Science
- EuroCOLT
- 1999

A new leveraging algorithm is introduced based on a natural potential function that has bounds that are incomparable to AdaBoost's, and their empirical performance is similar to Ada boost's. Expand

Special Invited Paper-Additive logistic regression: A statistical view of boosting

- Mathematics
- 2000

Boosting is one of the most important recent developments in classification methodology. Boosting works by sequentially applying a classification algorithm to reweighted versions of the training data… Expand

Boosting in the Limit: Maximizing the Margin of Learned Ensembles

- Computer Science
- AAAI/IAAI
- 1998

The crucial question as to why boosting works so well in practice, and how to further improve upon it, remains mostly open, and it is concluded that no simple version of the minimum-margin story can be complete. Expand

Experiments with a New Boosting Algorithm

- Computer Science
- ICML
- 1996

This paper describes experiments carried out to assess how well AdaBoost with and without pseudo-loss, performs on real learning problems and compared boosting to Breiman's "bagging" method when used to aggregate various classifiers. Expand

Boosting Decision Trees

- Computer Science
- NIPS
- 1995

A constructive, incremental learning system for regression problems that models data by means of locally linear experts that does not compete for data during learning and derives asymptotic results for this method. Expand

A decision-theoretic generalization of on-line learning and an application to boosting

- Computer Science
- EuroCOLT
- 1995

The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and the multiplicative weightupdate Littlestone Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems. Expand

Improved Boosting Algorithms using Confidence-Rated Predictions

- Mathematics, Computer Science
- COLT
- 1998

We describe several improvements to Freund and Schapire‘s AdaBoost boosting algorithm, particularly in a setting in which hypotheses may assign confidences to each of their predictions. We give a… Expand

Improved Boosting Algorithms Using Confidence-rated Predictions

- Computer Science
- COLT' 98
- 1998

We describe several improvements to Freund and Schapire's AdaBoost boosting algorithm, particularly in a setting in which hypotheses may assign confidences to each of their predictions. We give a… Expand

An Adaptive Version of the Boost by Majority Algorithm

- Mathematics, Computer Science
- COLT '99
- 1999

The paper describes two methods for finding approximate solutions to the differential equations and a method that results in a provably polynomial time algorithm based on the Newton-Raphson minimization procedure, which is much more efficient in practice but is not known to bePolynomial. Expand