# On the Convergence of Boosting Procedures

@inproceedings{Zhang2003OnTC, title={On the Convergence of Boosting Procedures}, author={Tong Zhang and Bin Yu}, booktitle={ICML}, year={2003} }

A boosting algorithm seeks to minimize empirically a loss function in a greedy fashion. The resulted estimator takes an additive function form and is built iteratively by applying a base estimator (or learner) to updated samples depending on the previous iterations. This paper studies convergence of boosting when it is carried out over the linear span of a family of basis functions. For general loss functions, we prove the convergence of boosting's greedy optimization to the infinimum of the… Expand

#### Topics from this paper

#### 2 Citations

Factorized MultiClass Boosting

- Computer Science, Mathematics
- ArXiv
- 2019

A new approach to multiclass classification problem that decomposes the problem into a series of regression tasks, that are solved with CART trees, allowing to reach high-quality results in significantly less time without class re-balancing. Expand

Survival regression with accelerated failure time model in XGBoost

- Computer Science, Mathematics
- ArXiv
- 2020

This work proposes and implements loss functions for learning accelerated failure time (AFT) models in XGBoost, to increase the support for survival modeling for different kinds of label censoring, and is the first implementation of AFT that utilizes the processing power of NVIDIA GPUs. Expand

#### References

SHOWING 1-10 OF 25 REFERENCES

Boosting With the L2 Loss

- Mathematics
- 2003

This article investigates a computationally simple variant of boosting, L2Boost, which is constructed from a functional gradient descent algorithm with the L2-loss function. Like other boosting… Expand

Greedy function approximation: A gradient boosting machine.

- Mathematics
- 2001

Function estimation/approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions… Expand

A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting

- Computer Science, Mathematics
- COLT 1997
- 1997

The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and it is shown that the multiplicative weight-update Littlestone?Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems. Expand

A decision-theoretic generalization of on-line learning and an application to boosting

- Computer Science
- EuroCOLT
- 1995

The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and the multiplicative weightupdate Littlestone Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems. Expand

Special Invited Paper-Additive logistic regression: A statistical view of boosting

- Mathematics
- 2000

Boosting is one of the most important recent developments in classification methodology. Boosting works by sequentially applying a classification algorithm to reweighted versions of the training data… Expand

Statistical behavior and consistency of classification methods based on convex risk minimization

- Mathematics
- 2003

We study how closely the optimal Bayes error rate can be approximately reached using a classification algorithm that computes a classifier by minimizing a convex upper bound of the classification… Expand

Logistic Regression, AdaBoost and Bregman Distances

- Mathematics, Computer Science
- Machine Learning
- 2004

A unified account of boosting and logistic regression in which each learning problem is cast in terms of optimization of Bregman distances, and a parameterized family of algorithms that includes both a sequential- and a parallel-update algorithm as special cases are described, thus showing how the sequential and parallel approaches can themselves be unified. Expand

SOME INFINITY THEORY FOR PREDICTOR ENSEMBLES

- Mathematics
- 2000

To dispel some of the mystery about what makes tree ensembles work, they are looked at in distribution space i.e. the limit case of "infinite" sample size. It is shown that the simplest kind of trees… Expand

A Simple Lemma on Greedy Approximation in Hilbert Space and Convergence Rates for Projection Pursuit Regression and Neural Network Training

- Mathematics
- 1992

A general convergence criterion for certain iterative sequences in Hilbert space is presented. For an important subclass of these sequences, estimates of the rate of convergence are given. Under very… Expand

Boosting the margin: A new explanation for the effectiveness of voting methods

- Mathematics, Computer Science
- ICML
- 1997

It is shown that techniques used in the analysis of Vapnik's support vector classifiers and of neural networks with small weights can be applied to voting methods to relate the margin distribution to the test error. Expand