# Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses

@article{Bassily2020StabilityOS, title={Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses}, author={Raef Bassily and Vitaly Feldman and Crist'obal Guzm'an and Kunal Talwar}, journal={ArXiv}, year={2020}, volume={abs/2006.06914} }

Uniform stability is a notion of algorithmic stability that bounds the worst case change in the model output by the algorithm when a single data point in the dataset is replaced. An influential work of Hardt et al. (2016) provides strong upper bounds on the uniform stability of the stochastic gradient descent (SGD) algorithm on sufficiently smooth convex losses. These results led to important progress in understanding of the generalization properties of SGD and several applications to…

## 99 Citations

### Beyond Lipschitz: Sharp Generalization and Excess Risk Bounds for Full-Batch GD

- Computer ScienceArXiv
- 2022

This key result shows that small generalization error occurs at stationary points, and allows us to bypass Lipschitz or sub-Gaussian assumptions on the loss prevalent in previous works.

### Stability and Generalization of Stochastic Gradient Methods for Minimax Problems

- Computer ScienceICML
- 2021

This paper provides a comprehensive generalization analysis of stochastic gradient methods for minimax problems under both convex-concave and nonconvex-nonconCave cases through the lens of algorithmic stability, and establishes a quantitative connection between stability and several generalization measures both in expectation and with high probability.

### Stability of SGD: Tightness Analysis and Improved Bounds

- Computer ScienceUAI
- 2022

Stochastic Gradient Descent based methods are used for training large-scale machine learning models that generalize well in practice, but there are no known examples of smooth loss functions for which the analysis can be shown to be tight, so open questions regarding tightness of bounds in the data-independent setting are settled.

### Stability and Generalization of Stochastic Optimization with Nonconvex and Nonsmooth Problems

- Computer ScienceArXiv
- 2022

A systematic stability and generalization analysis of stochastic optimization on nonconvex and nonsmooth problems is initialized and a class of sampling-determined algorithms are introduced, for which bounds for three stability measures are developed.

### Differentially private SGD with non-smooth losses

- Computer ScienceApplied and Computational Harmonic Analysis
- 2022

### High Probability Guarantees for Nonconvex Stochastic Gradient Descent with Heavy Tails

- Computer ScienceICML
- 2022

This paper develops high probability bounds for nonconvex SGD with a joint perspective of optimization and generalization performance, and shows that gradient clipping can be employed to remove the bounded gradient-type assumptions.

### Stability Based Generalization Bounds for Exponential Family Langevin Dynamics

- Computer ScienceICML
- 2022

Exonential Family Langevin Dynamics (EFLD), a substantial generalization of SGLD, is introduced, which includes noisy versions of Sign-SGD and quantized SGD as special cases and optimization guarantees for special cases of EFLD are established.

### Differentially Private SGDA for Minimax Problems

- Computer ScienceUAI
- 2022

This paper proves that the DP-SGDA can achieve an optimal utility rate in terms of the weak primal-dual population risk in both smooth and non-smooth cases, and provides its utility analysis in the nonconvex-strongly-concave setting.

### Differentially Private SGD with Non-Smooth Loss

- Computer ScienceArXiv
- 2021

It is proved that noisy SGD with alpha-Holder smoothness using gradient perturbation can guarantee $(epsilon,elta)$-differential privacy (DP) and attain optimal excess population risk with linear gradient complexity T = O(n).

### Stability and Generalization for Markov Chain Stochastic Gradient Methods

- Computer Science, Mathematics
- 2022

This paper provides a comprehensive generalization analysis of MC-SGMs for both minimization and minimax problems through the lens of algorithmic stability in the framework of statistical learning theory and develops the nearly optimal convergence rates for convex-concave problems.

## References

SHOWING 1-10 OF 50 REFERENCES

### Data-Dependent Stability of Stochastic Gradient Descent

- Computer ScienceICML
- 2018

A data-dependent notion of algorithmic stability for Stochastic Gradient Descent is established, and novel generalization bounds are developed that exhibit fast convergence rates for SGD subject to a vanishing empirical risk and low noise of stochastic gradient.

### Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent

- Computer ScienceICML
- 2020

This paper introduces a new stability measure called on-average model stability, for which novel bounds controlled by the risks of SGD iterates are developed, which gives the first-ever-known stability and generalization bounds for SGD with even non-differentiable loss functions.

### Stability and Generalization of Learning Algorithms that Converge to Global Optima

- Computer ScienceICML
- 2018

This work derives black-box stability results that only depend on the convergence of a learning algorithm and the geometry around the minimizers of the loss function that establish novel generalization bounds for learning algorithms that converge to global minima.

### Stability and Convergence Trade-off of Iterative Optimization Algorithms

- Computer ScienceArXiv
- 2018

This paper shows that for any iterative algorithm at any iteration, the overall performance is lower bounded by the minimax statistical error over an appropriately chosen loss function class and provides stability upper bounds for the quadratic loss function.

### Train faster, generalize better: Stability of stochastic gradient descent

- Computer ScienceICML
- 2016

We show that parametric models trained by a stochastic gradient method (SGM) with few iterations have vanishing generalization error. We prove our results by arguing that SGM is algorithmically…

### Generalization Bounds for Uniformly Stable Algorithms

- Computer Science, MathematicsNeurIPS
- 2018

A tight bound of $O(\gamma^2 + 1/n)$ on the second moment of the generalization error is proved and these results imply substantially stronger generalization guarantees for several well-studied algorithms.

### Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds

- Computer Science2014 IEEE 55th Annual Symposium on Foundations of Computer Science
- 2014

This work provides new algorithms and matching lower bounds for differentially private convex empirical risk minimization assuming only that each data point's contribution to the loss function is Lipschitz and that the domain of optimization is bounded.

### Private stochastic convex optimization: optimal rates in linear time

- Computer ScienceSTOC
- 2020

Two new techniques for deriving DP convex optimization algorithms both achieving the optimal bound on excess loss and using O(min{n, n 2/d}) gradient computations are described.

### (Near) Dimension Independent Risk Bounds for Differentially Private Learning

- Computer ScienceICML
- 2014

This paper shows that under certain assumptions, variants of both output and objective perturbation algorithms have no explicit dependence on p; the excess risk depends only on the L2-norm of the true risk minimizer and that of training points.

### Private Convex Empirical Risk Minimization and High-dimensional Regression

- Computer Science, MathematicsCOLT 2012
- 2012

This work significantly extends the analysis of the “objective perturbation” algorithm of Chaudhuri et al. (2011) for convex ERM problems, and gives the best known algorithms for differentially private linear regression.