Corpus ID: 12165073

On the Convergence of Boosting Procedures

@inproceedings{Zhang2003OnTC,
  title={On the Convergence of Boosting Procedures},
  author={Tong Zhang and Bin Yu},
  booktitle={ICML},
  year={2003}
}
A boosting algorithm seeks to minimize empirically a loss function in a greedy fashion. The resulted estimator takes an additive function form and is built iteratively by applying a base estimator (or learner) to updated samples depending on the previous iterations. This paper studies convergence of boosting when it is carried out over the linear span of a family of basis functions. For general loss functions, we prove the convergence of boosting's greedy optimization to the infinimum of the… Expand
Factorized MultiClass Boosting
TLDR
A new approach to multiclass classification problem that decomposes the problem into a series of regression tasks, that are solved with CART trees, allowing to reach high-quality results in significantly less time without class re-balancing. Expand
Survival regression with accelerated failure time model in XGBoost
TLDR
This work proposes and implements loss functions for learning accelerated failure time (AFT) models in XGBoost, to increase the support for survival modeling for different kinds of label censoring, and is the first implementation of AFT that utilizes the processing power of NVIDIA GPUs. Expand

References

SHOWING 1-10 OF 25 REFERENCES
Boosting With the L2 Loss
This article investigates a computationally simple variant of boosting, L2Boost, which is constructed from a functional gradient descent algorithm with the L2-loss function. Like other boostingExpand
Greedy function approximation: A gradient boosting machine.
Function estimation/approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansionsExpand
A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting
TLDR
The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and it is shown that the multiplicative weight-update Littlestone?Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems. Expand
A decision-theoretic generalization of on-line learning and an application to boosting
TLDR
The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and the multiplicative weightupdate Littlestone Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems. Expand
Special Invited Paper-Additive logistic regression: A statistical view of boosting
Boosting is one of the most important recent developments in classification methodology. Boosting works by sequentially applying a classification algorithm to reweighted versions of the training dataExpand
Statistical behavior and consistency of classification methods based on convex risk minimization
We study how closely the optimal Bayes error rate can be approximately reached using a classification algorithm that computes a classifier by minimizing a convex upper bound of the classificationExpand
Logistic Regression, AdaBoost and Bregman Distances
TLDR
A unified account of boosting and logistic regression in which each learning problem is cast in terms of optimization of Bregman distances, and a parameterized family of algorithms that includes both a sequential- and a parallel-update algorithm as special cases are described, thus showing how the sequential and parallel approaches can themselves be unified. Expand
SOME INFINITY THEORY FOR PREDICTOR ENSEMBLES
To dispel some of the mystery about what makes tree ensembles work, they are looked at in distribution space i.e. the limit case of "infinite" sample size. It is shown that the simplest kind of treesExpand
A Simple Lemma on Greedy Approximation in Hilbert Space and Convergence Rates for Projection Pursuit Regression and Neural Network Training
A general convergence criterion for certain iterative sequences in Hilbert space is presented. For an important subclass of these sequences, estimates of the rate of convergence are given. Under veryExpand
Boosting the margin: A new explanation for the effectiveness of voting methods
TLDR
It is shown that techniques used in the analysis of Vapnik's support vector classifiers and of neural networks with small weights can be applied to voting methods to relate the margin distribution to the test error. Expand
...
1
2
3
...