Random classification noise defeats all convex potential boosters

@article{Long2008RandomCN,
  title={Random classification noise defeats all convex potential boosters},
  author={Philip M. Long and Rocco A. Servedio},
  journal={Machine Learning},
  year={2008},
  volume={78},
  pages={287-304}
}
A broad class of boosting algorithms can be interpreted as performing coordinate-wise gradient descent to minimize some potential function of the margins of a data set. This class includes AdaBoost, LogitBoost, and other widely used and well-studied boosters. In this paper we show that for a broad class of convex potential functions, any such boosting algorithm is highly susceptible to random classification noise. We do this by showing that for any such booster and any nonzero random… 

Noise peeling methods to improve boosting algorithms

Learning with Noisy Labels

The problem of binary classification in the presence of random classification noise is theoretically studied—the learner sees labels that have independently been flipped with some small probability, and methods used in practice such as biased SVM and weighted logistic regression are provably noise-tolerant.

Cost-Sensitive Learning with Noisy Labels

The proposed methods are competitive with respect to recently proposed methods for dealing with label noise in several benchmark data sets and imply that methods already used in practice, such as biased SVM and weighted logistic regression, are provably noise-tolerant.

Non-convex boosting with minimum margin guarantees

This work introduces a new non-convex boosting algorithm BrownBoost-δ, a noiseresistant booster that is able to significantly increase accuracy on a set of noisy classification problems, and consistently outperforms the original BrownBoost algorithm, AdaBoost, and LogitBoost on simulated and real data.

The Implicit Bias of Benign Overfitting

It is shown that for regression, benign overfitting is “biased” towards certain types of problems, in the sense that its existence on one learning problem precludes itsexistence on other learning problems.

Learning with Symmetric Label Noise: The Importance of Being Unhinged

It is shown that the optimal unhinged solution is equivalent to that of a strongly regularised SVM, and is the limiting solution for any convex potential; this implies that strong l2 regularisation makes most standard learners SLN-robust.

Soft-max boosting

  • M. Geist
  • Computer Science
    Machine Learning
  • 2015
This paper proposes to replace the usually considered deterministic decision rule by a stochastic one, which allows obtaining a smooth risk (generalizing the expected binary loss, and more generally the cost-sensitive loss) and provides a convergence analysis of the resulting algorithm.

Direct Optimization for Classification with Boosting

Novel boosting algorithms that directly optimize non-convex performance measures, including the empirical classification error and margin functions, without resorting to any surrogates or approximations are designed.

On the Error Resistance of Hinge Loss Minimization

If the data is linearly classifiable with a slightly non-trivial margin, and the class-conditional distributions are near isotropic and logconcave, then surrogate loss minimization has negligible error on the uncorrupted data even when a constant fraction of examples are adversarially mislabeled.

Loss factorization, weakly supervised learning and label noise robustness

We prove that the empirical risk of most well-known loss functions factors into a linear term aggregating all labels with a term that is label free, and can further be expressed by sums of the loss.
...

References

SHOWING 1-10 OF 42 REFERENCES

Boosting in the presence of noise

A variant of the standard scenario for boosting in which the "weak learner" satisfies a slightly stronger condition than the usual weak learning guarantee is considered, and an efficient algorithm is given which can boost to arbitrarily high accuracy in the presence of classification noise.

Special Invited Paper-Additive logistic regression: A statistical view of boosting

This work shows that this seemingly mysterious phenomenon of boosting can be understood in terms of well-known statistical principles, namely additive modeling and maximum likelihood, and develops more direct approximations and shows that they exhibit nearly identical results to boosting.

Adaptive Martingale Boosting

An adaptive variant of the martingale boosting algorithm that inherits the desirable properties of the original [LS05] algorithm, such as random classification noise tolerance, and has other advantages besides adaptiveness: it requires polynomially fewer calls to the weak learner than the original algorithm, and it can be used with confidence-rated weak hypotheses that output real values rather than Boolean predictions.

Statistical behavior and consistency of classification methods based on convex risk minimization

This study sheds light on the good performance of some recently proposed linear classification methods including boosting and support vector machines and shows their limitations and suggests possible improvements.

The Consistency of Greedy Algorithms for Classification

A class of algorithms for classification, which are based on sequential greedy minimization of a convex upper bound on the 0 - 1 loss function are considered, and Logistic function based Boosting provides faster rates of convergence than Boosting based on the exponential function used in AdaBoost.

An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

The experiments show that in situations with little or no classification noise, randomization is competitive with (and perhaps slightly superior to) bagging but not as accurate as boosting, and sometimes better than randomization.

A decision-theoretic generalization of on-line learning and an application to boosting

The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and it is shown that the multiplicative weight-update Littlestone?Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems.

Experiments with a New Boosting Algorithm

This paper describes experiments carried out to assess how well AdaBoost with and without pseudo-loss, performs on real learning problems and compared boosting to Breiman's "bagging" method when used to aggregate various classifiers.

Boosting with early stopping: Convergence and consistency

This paper studies numerical convergence, consistency and statistical rates of convergence of boosting with early stopping, when it is carried out over the linear span of a family of basis functions, and leads to a rigorous proof that for a linearly separable problem, AdaBoost becomes an L 1 -margin maximizer when left to run to convergence.

Boosting a weak learning algorithm by majority

An algorithm for improving the accuracy of algorithms for learning binary concepts by combining a large number of hypotheses, each of which is generated by training the given learning algorithm on a different set of examples, is presented.