• Corpus ID: 219636105

# Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses

@article{Bassily2020StabilityOS,
title={Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses},
author={Raef Bassily and Vitaly Feldman and Crist'obal Guzm'an and Kunal Talwar},
journal={ArXiv},
year={2020},
volume={abs/2006.06914}
}
• Published 12 June 2020
• Computer Science
• ArXiv
Uniform stability is a notion of algorithmic stability that bounds the worst case change in the model output by the algorithm when a single data point in the dataset is replaced. An influential work of Hardt et al. (2016) provides strong upper bounds on the uniform stability of the stochastic gradient descent (SGD) algorithm on sufficiently smooth convex losses. These results led to important progress in understanding of the generalization properties of SGD and several applications to…
105 Citations

## Tables from this paper

• Computer Science
ICML
• 2021
This paper provides a comprehensive generalization analysis of stochastic gradient methods for minimax problems under both convex-concave and nonconvex-nonconCave cases through the lens of algorithmic stability, and establishes a quantitative connection between stability and several generalization measures both in expectation and with high probability.
Stochastic Gradient Descent based methods are used for training large-scale machine learning models that generalize well in practice, but there are no known examples of smooth loss functions for which the analysis can be shown to be tight, so open questions regarding tightness of bounds in the data-independent setting are settled.
• Computer Science
Applied and Computational Harmonic Analysis
• 2022
• Computer Science
ICML
• 2022
This paper develops high probability bounds for nonconvex SGD with a joint perspective of optimization and generalization performance, and shows that gradient clipping can be employed to remove the bounded gradient-type assumptions.
• Computer Science
ArXiv
• 2023
This article derives a near-tight high probability bound on the parameter estimation error of a sampling-without-replacement variant of M-SPP, which substantially improves the best known results of SPP-type approaches by revealing the impact of noise level of model on convergence rate.
• Computer Science
ICML
• 2022
Exonential Family Langevin Dynamics (EFLD), a substantial generalization of SGLD, is introduced, which includes noisy versions of Sign-SGD and quantized SGD as special cases and optimization guarantees for special cases of EFLD are established.
• Computer Science, Mathematics
• 2022
For convex and strongly convex loss functions, this work provides the first asymptotically optimal excess risk bounds (up to a logarithmic factor) and is the first to address non-convex non-uniformly Lipschitz loss functions satisfying the Proximal-PL inequality.
• Computer Science
UAI
• 2022
This paper proves that the DP-SGDA can achieve an optimal utility rate in terms of the weak primal-dual population risk in both smooth and non-smooth cases, and provides its utility analysis in the nonconvex-strongly-concave setting.
• Computer Science
ArXiv
• 2021
It is proved that noisy SGD with alpha-Holder smoothness using gradient perturbation can guarantee $(epsilon,elta)$-differential privacy (DP) and attain optimal excess population risk with linear gradient complexity T = O(n).
• Computer Science, Mathematics
ArXiv
• 2022
This paper provides a comprehensive generalization analysis of MC-SGMs for both minimization and minimax problems through the lens of algorithmic stability in the framework of statistical learning theory and develops the nearly optimal convergence rates for convex-concave problems.

## References

SHOWING 1-10 OF 50 REFERENCES

• Computer Science
ICML
• 2018
A data-dependent notion of algorithmic stability for Stochastic Gradient Descent is established, and novel generalization bounds are developed that exhibit fast convergence rates for SGD subject to a vanishing empirical risk and low noise of stochastic gradient.
• Computer Science
ICML
• 2020
This paper introduces a new stability measure called on-average model stability, for which novel bounds controlled by the risks of SGD iterates are developed, which gives the first-ever-known stability and generalization bounds for SGD with even non-differentiable loss functions.
• Computer Science
ArXiv
• 2018
This paper shows that for any iterative algorithm at any iteration, the overall performance is lower bounded by the minimax statistical error over an appropriately chosen loss function class and provides stability upper bounds for the quadratic loss function.
• Computer Science
NeurIPS
• 2019
The approach builds on existing differentially private algorithms and relies on the analysis of algorithmic stability to ensure generalization and implies that, contrary to intuition based on private ERM, private SCO has asymptotically the same rate of $1/\sqrt{n}$ as non-private SCO in the parameter regime most common in practice.
• Computer Science, Mathematics
NeurIPS
• 2018
A tight bound of $O(\gamma^2 + 1/n)$ on the second moment of the generalization error is proved and these results imply substantially stronger generalization guarantees for several well-studied algorithms.
• Computer Science
2014 IEEE 55th Annual Symposium on Foundations of Computer Science
• 2014
This work provides new algorithms and matching lower bounds for differentially private convex empirical risk minimization assuming only that each data point's contribution to the loss function is Lipschitz and that the domain of optimization is bounded.
• Computer Science
STOC
• 2020
Two new techniques for deriving DP convex optimization algorithms both achieving the optimal bound on excess loss and using O(min{n, n 2/d}) gradient computations are described.
• Computer Science
ICML
• 2014
This paper shows that under certain assumptions, variants of both output and objective perturbation algorithms have no explicit dependence on p; the excess risk depends only on the L2-norm of the true risk minimizer and that of training points.
• Computer Science, Mathematics
COLT 2012
• 2012
This work significantly extends the analysis of the “objective perturbation” algorithm of Chaudhuri et al. (2011) for convex ERM problems, and gives the best known algorithms for differentially private linear regression.
• Computer Science, Mathematics
COLT
• 2020
A short proof of the moment bound that implies the generalization bound stronger than both recent results is provided, and a general concentration inequality for weakly correlated random variables is proved, which may be of independent interest.