Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization

@article{Mukherjee2006LearningTS,
  title={Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization},
  author={Sayan Mukherjee and Partha Niyogi and Tomaso A. Poggio and Ryan M. Rifkin},
  journal={Advances in Computational Mathematics},
  year={2006},
  volume={25},
  pages={161-193}
}
Abstract Solutions of learning problems by Empirical Risk Minimization (ERM) – and almost-ERM when the minimizer does not exist – need to be consistent, so that they may be predictive. They also need to be well-posed in the sense of being stable, so that they might be used robustly. We propose a statistical form of stability, defined as leave-one-out (LOO) stability. We prove that for bounded loss classes LOO stability is (a) sufficient for generalization, that is convergence in probability of… Expand
Learnability, Stability and Uniform Convergence
TLDR
This paper considers the General Learning Setting (introduced by Vapnik), which includes most statistical learning problems as special cases, and identifies stability as the key necessary and sufficient condition for learnability. Expand
An Exponential Efron-Stein Inequality for Lq Stable Learning Rules
TLDR
An exponential tail bound is derived for the concentration of the estimated risk of a hypothesis returned by a general learning rule, where the estimatedrisk is expressed in terms of either the resubstitution estimate (empirical error), or the deleted (or, leave-one-out) estimate. Expand
Approximation Stability and Boosting
TLDR
The approximation stability is introduced and it is proved that AdaBoost has approximation stability and thus has good generalization, and an exponential bound for AdaBoost is provided. Expand
Stability Conditions for Online Learnability
Stability is a general notion that quantifies the sensitivity of a learning algorithm's output to small change in the training dataset (e.g. deletion or replacement of a single training sample). SuchExpand
Average Stability is Invariant to Data Preconditioning. Implications to Exp-concave Empirical Risk Minimization
We show that the average stability notion introduced by \cite{kearns1999algorithmic, bousquet2002stability} is invariant to data preconditioning, for a wide class of generalized linear models thatExpand
Online Learning, Stability, and Stochastic Gradient Descent
TLDR
It is shown that stochastic gradient descent (SDG) with the usual hypotheses is CVon stable and the implications of CVon stability for convergence of SGD are discussed. Expand
Sufficient Conditions for Uniform Stability of Regularization Algorithms
In this paper, we study the stability and generalization properties of penalized empirical-risk minimization algorithms. We propose a set of properties of the penalty term that is sufficient toExpand
Stable Foundations for Learning: a foundational framework for learning theory in both the classical and modern regime.
I consider here the class of supervised learning algorithms known as Empirical Risk Minimization (ERM). The classical theory by Vapnik and others characterize universal consistency of ERM in theExpand
Stable Foundations for Learning: a framework for learning theory (in both the classical and modern regime)
I consider here the class of supervised learning algorithms known as Empirical Risk Minimization (ERM). The classical theory by Vapnik and others characterize universal consistency of ERM in theExpand
Stability and Generalization of Learning Algorithms that Converge to Global Optima
TLDR
This work derives black-box stability results that only depend on the convergence of a learning algorithm and the geometry around the minimizers of the loss function that establish novel generalization bounds for learning algorithms that converge to global minima. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 44 REFERENCES
Statistical Learning: Stability is Sufficient for Generalization and Necessary and Sufficient for Consistency of Empirical Risk Minimization
Abstract : Solutions of learning problems by Empirical Risk Minimization (ERM) -- and almost-ERM when the minimizer does not exist -- need to be consistent, so that they may be predictive. They alsoExpand
Almost-everywhere Algorithmic Stability and Generalization Error
TLDR
The new notion of training stability of a learning algorithm is introduced and it is shown that, in a general setting, it is sufficient for good bounds on generalization error. Expand
Algorithmic Stability and Sanity-Check Bounds for Leave-One-Out Cross-Validation
In this article we prove sanity-check bounds for the error of the leave-oneout cross-validation estimate of the generalization error: that is, bounds showing that the worst-case error of thisExpand
Algorithmic Stability and Sanity-Check Bounds for Leave-one-Out Cross-Validation
In this article we prove sanity-check bounds for the error of the leave-oneout cross-validation estimate of the generalization error: that is, bounds showing that the worst-case error of thisExpand
General conditions for predictivity in learning theory
TLDR
Conditions for generalization in terms of a precise stability property of the learning process are provided: when the training set is perturbed by deleting one example, the learned hypothesis does not change much. Expand
Stability and Generalization
TLDR
These notions of stability for learning algorithms are defined and it is shown how to use these notions to derive generalization error bounds based on the empirical error and the leave-one-out error. Expand
Algorithmic Stability and Generalization Performance
TLDR
This work presents a novel way of obtaining PAC-style bounds on the generalization error of learning algorithms, explicitly using their stability properties, and demonstrates that regularization networks possess the required stability property. Expand
Scale-sensitive dimensions, uniform convergence, and learnability
TLDR
A characterization of learnability in the probabilistic concept model, solving an open problem posed by Kearns and Schapire, and shows that the accuracy parameter plays a crucial role in determining the effective complexity of the learner's hypothesis class. Expand
ON CONVERGENCE OF STOCHASTIC PROCESSES
It is clear that for given I,un } and t, the better theorem of this kind would be the one in which (2) is proved for the larger class of functions f. In this paper we shall show that certain knownExpand
Ill-posed problems in early vision
Mathematical results on ill-posed and ill-conditioned problems are reviewed and the formal aspects of regularization theory in the linear case are introduced. Specific topics in early vision andExpand
...
1
2
3
4
5
...