Corpus ID: 6101385

Boosting Algorithms as Gradient Descent

  title={Boosting Algorithms as Gradient Descent},
  author={Llew Mason and Jonathan Baxter and Peter L. Bartlett and Marcus Frean},
We provide an abstract characterization of boosting algorithms as gradient decsent on cost-functionals in an inner-product function space. We prove convergence of these functional-gradient-descent algorithms under quite weak conditions. Following previous theoretical results bounding the generalization performance of convex combinations of classifiers in terms of general cost functions of the margin, we present a new algorithm (DOOM II) for performing a gradient descent optimization of such… Expand
Cost-sensitive boosting algorithms as gradient descent
It is shown that several typical CSB methods can also be view as gradient descent for minimizing a unified objective function and deduce a general greedy boosting procedure. Expand
Historical Gradient Boosting Machine
The Historical Gradient Boosting Machine is introduced with the objective of improving the convergence speed of gradient boosting and incorporates both the accumulated previous gradients and the current gradient into the computation of descent direction in the function space. Expand
Accelerated gradient boosting
It is empirically shown that AGB is less sensitive to the shrinkage parameter and outputs predictors that are considerably more sparse in the number of trees, while retaining the exceptional performance of gradient boosting. Expand
Optimization by gradient boosting
A thorough analysis of two widespread versions of gradient boosting is provided, and a general framework for studying these algorithms from the point of view of functional optimization is introduced. Expand
Robust Loss Functions for Boosting
Numerical experiments illustrate that the proposed loss functions derived from the contamination models are useful for handling highly noisy data in comparison with other loss functions. Expand
AdaBoost and Forward Stagewise Regression are First-Order Convex Optimization Methods
This paper analyze two well-known boosting methods, AdaBoost and Incremental Forward Stagewise Regression, by establishing their precise connections to the Mirror Descent algorithm, which is a first-order method in convex optimization. Expand
Generalization Error and Algorithmic Convergence of Median Boosting
  • B. Kégl
  • Computer Science, Mathematics
  • NIPS
  • 2004
This paper extends theoretical results obtained for ADABOOST to median boosting and to its localized variant to show that the algorithm can converge to the maximum achievable margin within a preset precision in a finite number of steps. Expand
Gradient Boosting with Extreme Learning Machines for the Optimization of Nonlinear Functionals
This paper focuses on the gradient boosting technique combined with the ELM to address important instances of optimization problems such as optimal control of a complex system, multistage optimization and maximum likelihood estimation. Expand
A geometric approach to leveraging weak learners
A new leveraging algorithm is introduced based on a natural potential function for improving the hypotheses generated by weak learning algorithms and is likely to perform better than AdaBoost on noisy data and with weak learners returning low confidence hypotheses. Expand
Shrunken learning rates do not improve AdaBoost on benchmark datasets
Reduced learning rates cannot be recommended for use with boosted decision trees on datasets similar to these benchmark datasets and it is concluded that reduced learning rates provide no statistically significant improvement on these datasets. Expand


Greedy function approximation: A gradient boosting machine.
Function estimation/approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansionsExpand
A Geometric Approach to Leveraging Weak Learners
A new leveraging algorithm is introduced based on a natural potential function that has bounds that are incomparable to AdaBoost's, and their empirical performance is similar to Ada boost's. Expand
Special Invited Paper-Additive logistic regression: A statistical view of boosting
Boosting is one of the most important recent developments in classification methodology. Boosting works by sequentially applying a classification algorithm to reweighted versions of the training dataExpand
Boosting in the Limit: Maximizing the Margin of Learned Ensembles
The crucial question as to why boosting works so well in practice, and how to further improve upon it, remains mostly open, and it is concluded that no simple version of the minimum-margin story can be complete. Expand
Experiments with a New Boosting Algorithm
This paper describes experiments carried out to assess how well AdaBoost with and without pseudo-loss, performs on real learning problems and compared boosting to Breiman's "bagging" method when used to aggregate various classifiers. Expand
Boosting Decision Trees
A constructive, incremental learning system for regression problems that models data by means of locally linear experts that does not compete for data during learning and derives asymptotic results for this method. Expand
A decision-theoretic generalization of on-line learning and an application to boosting
The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and the multiplicative weightupdate Littlestone Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems. Expand
Improved Boosting Algorithms using Confidence-Rated Predictions
We describe several improvements to Freund and Schapire‘s AdaBoost boosting algorithm, particularly in a setting in which hypotheses may assign confidences to each of their predictions. We give aExpand
Improved Boosting Algorithms Using Confidence-rated Predictions
We describe several improvements to Freund and Schapire's AdaBoost boosting algorithm, particularly in a setting in which hypotheses may assign confidences to each of their predictions. We give aExpand
An Adaptive Version of the Boost by Majority Algorithm
  • Y. Freund
  • Mathematics, Computer Science
  • COLT '99
  • 1999
The paper describes two methods for finding approximate solutions to the differential equations and a method that results in a provably polynomial time algorithm based on the Newton-Raphson minimization procedure, which is much more efficient in practice but is not known to bePolynomial. Expand