Generalization Error Bounds for Noisy, Iterative Algorithms

@article{Pensia2018GeneralizationEB,
  title={Generalization Error Bounds for Noisy, Iterative Algorithms},
  author={Ankit Pensia and Varun Jog and Po-Ling Loh},
  journal={2018 IEEE International Symposium on Information Theory (ISIT)},
  year={2018},
  pages={546-550}
}
In statistical learning theory, generalization error is used to quantify the degree to which a supervised machine learning algorithm may overfit to training data. Recent work [Xu and Raginsky (2017)] has established a bound on the generalization error of empirical risk minimization based on the mutual information $I$($S$; W) between the algorithm input $S$ and the algorithm output W, when the loss function is sub-Gaussian. We leverage these results to derive generalization error bounds for a… 

Figures from this paper

On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning
TLDR
A new framework, termed Bayes-Stability, is developed for proving algorithm-dependent generalization error bounds for learning general non-convex objectives and it is demonstrated that the data-dependent bounds can distinguish randomly labelled data from normal data.
Generalization error bounds using Wasserstein distances
TLDR
Upper bounds on the generalization error are derived in terms of a certain Wasserstein distance involving the distributions of the input and output of an algorithm under the assumption of a Lipschitz continuous loss function.
Tightening Mutual Information-Based Bounds on Generalization Error
TLDR
An information-theoretic upper bound on the generalization error of supervised learning algorithms is derived in terms of the mutual information between each individual training sample and the output of the learning algorithm.
Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints
TLDR
This is the first algorithm-dependent result with reasonable dependence on aggregated step sizes for non-convex learning, and has important implications to statistical learning aspects of stochastic gradient methods in complicated models such as deep learning.
Stability Based Generalization Bounds for Exponential Family Langevin Dynamics
TLDR
This paper unify and substantially generalize stability based generalization bounds and introduces Exponential Family Langevin Dynamics (EFLD) which is a substantial generalization of SGLD and which allows exponential family noise to be used with stochastic gradient descent (SGD).
Tightening Mutual Information Based Bounds on Generalization Error
TLDR
Application to noisy and iterative algorithms, e.g., stochastic gradient Langevin dynamics (SGLD), is also studied, where the constructed bound provides a tighter characterization of the generalization error than existing results.
Generalization Bounds for Noisy Iterative Algorithms Using Properties of Additive Noise Channels
TLDR
This paper derives distributiondependent generalization bounds by connecting noisy iterative algorithms to additive noise channels found in communication and information theory to shed light on several applications, including differentially private stochastic gradient descent (DP-SGD), federated learning, and stochastically gradient Langevin dynamics (SGLD).
Generalization Properties of Stochastic Optimizers via Trajectory Analysis
TLDR
An encompassing theoretical framework for investigating the generalization properties of stochastic optimizers, which is based on their dynamics, and it is shown that both the Fernique–Talagrand functional and the local power-law exponent are predictive of generalization performance.
Generalization Error Bounds with Probabilistic Guarantee for SGD in Nonconvex Optimization
TLDR
This paper characterize the on-average stability of the iterates generated by SGD in terms of the on average variance of the stochastic gradients, which leads to improved bounds for the generalization error for SGD.
Information-Theoretic Characterization of the Generalization Error for Iterative Semi-Supervised Learning
TLDR
This work provides an exact expression for thegen-error and particularize it to the binary Gaussian mixture model and shows that regularization can reduce the gen-error.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 24 REFERENCES
Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints
TLDR
This is the first algorithm-dependent result with reasonable dependence on aggregated step sizes for non-convex learning, and has important implications to statistical learning aspects of stochastic gradient methods in complicated models such as deep learning.
Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm
TLDR
An improved finite-sample guarantee on the linear convergence of stochastic gradient descent for smooth and strongly convex objectives is obtained, and it is shown how reweighting the sampling distribution is necessary in order to further improve convergence.
Information-theoretic analysis of generalization capability of learning algorithms
We derive upper bounds on the generalization error of a learning algorithm in terms of the mutual information between its input and output. The bounds provide an information-theoretic understanding
Information-theoretic analysis of stability and bias of learning algorithms
TLDR
This work proposes several information-theoretic measures of algorithmic stability and uses them to upper-bound the generalization bias of learning algorithms.
Generalization Bounds for Randomized Learning with Application to Stochastic Gradient Descent
TLDR
Stochastic gradient descent is presented, a first-order method that approximates the learning objective and gradient by a random point estimate.
Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes
TLDR
The performance of SGD without non-trivial smoothness assumptions is investigated, as well as a running average scheme to convert the SGD iterates to a solution with optimal optimization accuracy, and a new and simple averaging scheme is proposed which not only attains optimal rates, but can also be easily computed on-the-fly.
Stability of Randomized Learning Algorithms
TLDR
The formal definitions of stability for randomized algorithms are given and non-asymptotic bounds on the difference between the empirical and expected error as well as the leave-one-out and expectederror of such algorithms that depend on their random stability are proved.
Stability and Generalization
TLDR
These notions of stability for learning algorithms are defined and it is shown how to use these notions to derive generalization error bounds based on the empirical error and the leave-one-out error.
Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization
TLDR
This paper investigates the optimality of SGD in a stochastic setting, and shows that for smooth problems, the algorithm attains the optimal O(1/T) rate, however, for non-smooth problems the convergence rate with averaging might really be Ω(log(T)/T), and this is not just an artifact of the analysis.
Train faster, generalize better: Stability of stochastic gradient descent
We show that parametric models trained by a stochastic gradient method (SGM) with few iterations have vanishing generalization error. We prove our results by arguing that SGM is algorithmically
...
1
2
3
...