# Generalization Error Bounds for Noisy, Iterative Algorithms

@article{Pensia2018GeneralizationEB, title={Generalization Error Bounds for Noisy, Iterative Algorithms}, author={Ankit Pensia and Varun Jog and Po-Ling Loh}, journal={2018 IEEE International Symposium on Information Theory (ISIT)}, year={2018}, pages={546-550} }

In statistical learning theory, generalization error is used to quantify the degree to which a supervised machine learning algorithm may overfit to training data. Recent work [Xu and Raginsky (2017)] has established a bound on the generalization error of empirical risk minimization based on the mutual information $I$($S$; W) between the algorithm input $S$ and the algorithm output W, when the loss function is sub-Gaussian. We leverage these results to derive generalization error bounds for a…

## Figures from this paper

## 52 Citations

On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning

- Computer ScienceICLR
- 2020

A new framework, termed Bayes-Stability, is developed for proving algorithm-dependent generalization error bounds for learning general non-convex objectives and it is demonstrated that the data-dependent bounds can distinguish randomly labelled data from normal data.

Generalization error bounds using Wasserstein distances

- Computer Science2018 IEEE Information Theory Workshop (ITW)
- 2018

Upper bounds on the generalization error are derived in terms of a certain Wasserstein distance involving the distributions of the input and output of an algorithm under the assumption of a Lipschitz continuous loss function.

Tightening Mutual Information-Based Bounds on Generalization Error

- Computer ScienceIEEE Journal on Selected Areas in Information Theory
- 2020

An information-theoretic upper bound on the generalization error of supervised learning algorithms is derived in terms of the mutual information between each individual training sample and the output of the learning algorithm.

Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints

- Computer ScienceCOLT
- 2018

This is the first algorithm-dependent result with reasonable dependence on aggregated step sizes for non-convex learning, and has important implications to statistical learning aspects of stochastic gradient methods in complicated models such as deep learning.

Stability Based Generalization Bounds for Exponential Family Langevin Dynamics

- Computer Science
- 2022

This paper unify and substantially generalize stability based generalization bounds and introduces Exponential Family Langevin Dynamics (EFLD) which is a substantial generalization of SGLD and which allows exponential family noise to be used with stochastic gradient descent (SGD).

Tightening Mutual Information Based Bounds on Generalization Error

- Computer Science2019 IEEE International Symposium on Information Theory (ISIT)
- 2019

Application to noisy and iterative algorithms, e.g., stochastic gradient Langevin dynamics (SGLD), is also studied, where the constructed bound provides a tighter characterization of the generalization error than existing results.

Generalization Bounds for Noisy Iterative Algorithms Using Properties of Additive Noise Channels

- Computer Science
- 2021

This paper derives distributiondependent generalization bounds by connecting noisy iterative algorithms to additive noise channels found in communication and information theory to shed light on several applications, including differentially private stochastic gradient descent (DP-SGD), federated learning, and stochastically gradient Langevin dynamics (SGLD).

Generalization Properties of Stochastic Optimizers via Trajectory Analysis

- Computer ScienceArXiv
- 2021

An encompassing theoretical framework for investigating the generalization properties of stochastic optimizers, which is based on their dynamics, and it is shown that both the Fernique–Talagrand functional and the local power-law exponent are predictive of generalization performance.

Generalization Error Bounds with Probabilistic Guarantee for SGD in Nonconvex Optimization

- Computer ScienceArXiv
- 2018

This paper characterize the on-average stability of the iterates generated by SGD in terms of the on average variance of the stochastic gradients, which leads to improved bounds for the generalization error for SGD.

Information-Theoretic Characterization of the Generalization Error for Iterative Semi-Supervised Learning

- Computer Science
- 2021

This work provides an exact expression for thegen-error and particularize it to the binary Gaussian mixture model and shows that regularization can reduce the gen-error.

## References

SHOWING 1-10 OF 24 REFERENCES

Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints

- Computer ScienceCOLT
- 2018

This is the first algorithm-dependent result with reasonable dependence on aggregated step sizes for non-convex learning, and has important implications to statistical learning aspects of stochastic gradient methods in complicated models such as deep learning.

Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm

- Computer Science, MathematicsNIPS
- 2014

An improved finite-sample guarantee on the linear convergence of stochastic gradient descent for smooth and strongly convex objectives is obtained, and it is shown how reweighting the sampling distribution is necessary in order to further improve convergence.

Information-theoretic analysis of generalization capability of learning algorithms

- Computer ScienceNIPS
- 2017

We derive upper bounds on the generalization error of a learning algorithm in terms of the mutual information between its input and output. The bounds provide an information-theoretic understanding…

Information-theoretic analysis of stability and bias of learning algorithms

- Computer Science2016 IEEE Information Theory Workshop (ITW)
- 2016

This work proposes several information-theoretic measures of algorithmic stability and uses them to upper-bound the generalization bias of learning algorithms.

Generalization Bounds for Randomized Learning with Application to Stochastic Gradient Descent

- Computer Science
- 2016

Stochastic gradient descent is presented, a first-order method that approximates the learning objective and gradient by a random point estimate.

Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes

- Computer ScienceICML
- 2013

The performance of SGD without non-trivial smoothness assumptions is investigated, as well as a running average scheme to convert the SGD iterates to a solution with optimal optimization accuracy, and a new and simple averaging scheme is proposed which not only attains optimal rates, but can also be easily computed on-the-fly.

Stability of Randomized Learning Algorithms

- Computer Science, MathematicsJ. Mach. Learn. Res.
- 2005

The formal definitions of stability for randomized algorithms are given and non-asymptotic bounds on the difference between the empirical and expected error as well as the leave-one-out and expectederror of such algorithms that depend on their random stability are proved.

Stability and Generalization

- Computer Science, MathematicsJ. Mach. Learn. Res.
- 2002

These notions of stability for learning algorithms are defined and it is shown how to use these notions to derive generalization error bounds based on the empirical error and the leave-one-out error.

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

- Computer ScienceICML
- 2012

This paper investigates the optimality of SGD in a stochastic setting, and shows that for smooth problems, the algorithm attains the optimal O(1/T) rate, however, for non-smooth problems the convergence rate with averaging might really be Ω(log(T)/T), and this is not just an artifact of the analysis.

Train faster, generalize better: Stability of stochastic gradient descent

- Computer ScienceICML
- 2016

We show that parametric models trained by a stochastic gradient method (SGM) with few iterations have vanishing generalization error. We prove our results by arguing that SGM is algorithmically…