Logistic Regression, AdaBoost and Bregman Distances

@article{Collins2004LogisticRA,
  title={Logistic Regression, AdaBoost and Bregman Distances},
  author={Michael Collins and Robert E. Schapire and Yoram Singer},
  journal={Machine Learning},
  year={2004},
  volume={48},
  pages={253-285}
}
We give a unified account of boosting and logistic regression in which each learning problem is cast in terms of optimization of Bregman distances. The striking similarity of the two problems in this framework allows us to design and analyze algorithms for both simultaneously, and to easily adapt algorithms designed for one problem to the other. For both problems, we give new algorithms and explain their potential advantages over existing methods. These algorithms are iterative and can be… Expand
A comparison of numerical optimizers for logistic regression
Logistic regression is a workhorse of statistics and is closely related to method s used in Machine Learning, including the Perceptron and the Support Vector Machine. This note compares eightExpand
Bregman Divergences and Surrogates for Learning
  • R. Nock, F. Nielsen
  • Mathematics, Medicine
  • IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2009
TLDR
This paper addresses the problem for a wide set which lies at the intersection of classification calibrated surrogates and those of Murata et al. (2004), and gives a minimization algorithm provably converging to the minimum of any such surrogate. Expand
On the Convergence Properties of Optimal AdaBoost
TLDR
This paper establishes the convergence of "Optimal AdaBoost," a term coined by Rudin, Daubechies, and Schapire in 2004, and proves the convergence, with the number of rounds, of the classifier itself, its generalization error, and its resulting margins for fixed data sets, under certain reasonable conditions. Expand
Surrogate maximization/minimization algorithms for AdaBoost and the logistic regression model
TLDR
This paper solves the boosting problem by proposing SM algorithms for the corresponding optimization problem and derives an SM algorithm that can be shown to be identical to the algorithm proposed by Collins et al. (2002) based on Bregman distance. Expand
Intrinsic Geometries in Learning
TLDR
This paper shows that many learning algorithms, including various boosting algorithms for linear separators, the most popular top-down decision-tree induction algorithms, and some on-linelearning algorithms, are spawns of a generalization of Amari's natural gradient to some particular non-Riemannian spaces. Expand
Ensembles of Partially Trained SVMs with Multiplicative Updates
TLDR
This paper shows that the multiplicative update of SVM can be formulated as a Bregman projection problem, and the learning rate can then be adapted automatically, and it is shown that the proposed ensemble has efficient training and comparable or even better accuracy than the best-tuned soft-margin SVM. Expand
Boosting With the L2 Loss
This article investigates a computationally simple variant of boosting, L2Boost, which is constructed from a functional gradient descent algorithm with the L2-loss function. Like other boostingExpand
Bundle Methods for Regularized Risk Minimization
TLDR
The theory and implementation of a scalable and modular convex solver which solves all these estimation problems, which can be parallelized on a cluster of workstations, allows for data-locality, and can deal with regularizers such as L1 and L2 penalties is described. Expand
Boosting as a Regularized Path to a Maximum Margin Classifier
TLDR
It is built on recent work by Efron et al. to show that boosting approximately (and in some cases exactly) minimizes its loss criterion with an l1 constraint on the coefficient vector, and shows that as the constraint is relaxed the solution converges (in the separable case) to an "l1-optimal" separating hyper-plane. Expand
Information Geometry of U-Boost and Bregman Divergence
TLDR
It is observed that the two adjacent and the initial classifiers are associated with a right triangle in the scale via the Bregman divergence, called the Pythagorean relation, which leads to a mild convergence property of the U-Boost algorithm as seen in the expectation-maximization algorithm. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 106 REFERENCES
Additive models, boosting, and inference for generalized divergences
TLDR
A framework for designing incremental learning algorithms derived from generalized entropy functionals based on the use of Bregman divergences together with the associated class of additive models constructed using the Legendre transform is presented. Expand
Duality and Auxiliary Functions for Bregman Distances
Abstract : In this paper, the authors formulate and prove a convex duality theorem for minimizing a general class of Bregman distances subject to linear constraints. The duality result is then usedExpand
Special Invited Paper-Additive logistic regression: A statistical view of boosting
Boosting is one of the most important recent developments in classification methodology. Boosting works by sequentially applying a classification algorithm to reweighted versions of the training dataExpand
Improved Boosting Algorithms Using Confidence-rated Predictions
We describe several improvements to Freund and Schapire's AdaBoost boosting algorithm, particularly in a setting in which hypotheses may assign confidences to each of their predictions. We give aExpand
A decision-theoretic generalization of on-line learning and an application to boosting
TLDR
The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and the multiplicative weightupdate Littlestone Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems. Expand
Statistical Learning Algorithms Based on Bregman Distances
| We present a class of statistical learning algorithms formulated in terms of minimizing Bregman distances, a family of generalized entropy measures associated with convex functions. The inductiveExpand
Boosting as entropy projection
TLDR
It is shown how AdaBoost’s choice of the new distribution can be seen as an approximate solution to the following problem: Find a new distribution that is closest to the old distribution subject to the constraint that thenew distribution is orthogonal to the vector of mistakes of the current weak hypothesis. Expand
Soft Margins for AdaBoost
TLDR
It is found that ADABOOST asymptotically achieves a hard margin distribution, i.e. the algorithm concentrates its resources on a few hard-to-learn patterns that are interestingly very similar to Support Vectors. Expand
Exponentiated Gradient Versus Gradient Descent for Linear Predictors
TLDR
The bounds suggest that the losses of the algorithms are in general incomparable, but EG(+/-) has a much smaller loss if only a few components of the input are relevant for the predictions, which is quite tight already on simple artificial data. Expand
Arcing the edge
Recent work has shown that adaptively reweighting the training set, growing a classifier using the new weights, and combining the classifiers constructed to date can significantly decreaseExpand
...
1
2
3
4
5
...