Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses
@article{Bassily2020StabilityOS, title={Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses}, author={Raef Bassily and Vitaly Feldman and Crist'obal Guzm'an and Kunal Talwar}, journal={ArXiv}, year={2020}, volume={abs/2006.06914} }
Uniform stability is a notion of algorithmic stability that bounds the worst case change in the model output by the algorithm when a single data point in the dataset is replaced. An influential work of Hardt et al. (2016) provides strong upper bounds on the uniform stability of the stochastic gradient descent (SGD) algorithm on sufficiently smooth convex losses. These results led to important progress in understanding of the generalization properties of SGD and several applications to…
Tables from this paper
105 Citations
Stability and Generalization of Stochastic Gradient Methods for Minimax Problems
- Computer ScienceICML
- 2021
This paper provides a comprehensive generalization analysis of stochastic gradient methods for minimax problems under both convex-concave and nonconvex-nonconCave cases through the lens of algorithmic stability, and establishes a quantitative connection between stability and several generalization measures both in expectation and with high probability.
Stability of SGD: Tightness Analysis and Improved Bounds
- Computer ScienceUAI
- 2022
Stochastic Gradient Descent based methods are used for training large-scale machine learning models that generalize well in practice, but there are no known examples of smooth loss functions for which the analysis can be shown to be tight, so open questions regarding tightness of bounds in the data-independent setting are settled.
Differentially private SGD with non-smooth losses
- Computer ScienceApplied and Computational Harmonic Analysis
- 2022
High Probability Guarantees for Nonconvex Stochastic Gradient Descent with Heavy Tails
- Computer ScienceICML
- 2022
This paper develops high probability bounds for nonconvex SGD with a joint perspective of optimization and generalization performance, and shows that gradient clipping can be employed to remove the bounded gradient-type assumptions.
Sharper Analysis for Minibatch Stochastic Proximal Point Methods: Stability, Smoothness, and Deviation
- Computer ScienceArXiv
- 2023
This article derives a near-tight high probability bound on the parameter estimation error of a sampling-without-replacement variant of M-SPP, which substantially improves the best known results of SPP-type approaches by revealing the impact of noise level of model on convergence rate.
Stability Based Generalization Bounds for Exponential Family Langevin Dynamics
- Computer ScienceICML
- 2022
Exonential Family Langevin Dynamics (EFLD), a substantial generalization of SGLD, is introduced, which includes noisy versions of Sign-SGD and quantized SGD as special cases and optimization guarantees for special cases of EFLD are established.
Private Stochastic Optimization With Large Worst-Case Lipschitz Parameter: Optimal Rates for (Non-Smooth) Convex Losses and Extension to Non-Convex Losses
- Computer Science, Mathematics
- 2022
For convex and strongly convex loss functions, this work provides the first asymptotically optimal excess risk bounds (up to a logarithmic factor) and is the first to address non-convex non-uniformly Lipschitz loss functions satisfying the Proximal-PL inequality.
Differentially Private SGDA for Minimax Problems
- Computer ScienceUAI
- 2022
This paper proves that the DP-SGDA can achieve an optimal utility rate in terms of the weak primal-dual population risk in both smooth and non-smooth cases, and provides its utility analysis in the nonconvex-strongly-concave setting.
Differentially Private SGD with Non-Smooth Loss
- Computer ScienceArXiv
- 2021
It is proved that noisy SGD with alpha-Holder smoothness using gradient perturbation can guarantee $(epsilon,elta)$-differential privacy (DP) and attain optimal excess population risk with linear gradient complexity T = O(n).
Stability and Generalization for Markov Chain Stochastic Gradient Methods
- Computer Science, MathematicsArXiv
- 2022
This paper provides a comprehensive generalization analysis of MC-SGMs for both minimization and minimax problems through the lens of algorithmic stability in the framework of statistical learning theory and develops the nearly optimal convergence rates for convex-concave problems.
References
SHOWING 1-10 OF 50 REFERENCES
Data-Dependent Stability of Stochastic Gradient Descent
- Computer ScienceICML
- 2018
A data-dependent notion of algorithmic stability for Stochastic Gradient Descent is established, and novel generalization bounds are developed that exhibit fast convergence rates for SGD subject to a vanishing empirical risk and low noise of stochastic gradient.
Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent
- Computer ScienceICML
- 2020
This paper introduces a new stability measure called on-average model stability, for which novel bounds controlled by the risks of SGD iterates are developed, which gives the first-ever-known stability and generalization bounds for SGD with even non-differentiable loss functions.
Stability and Convergence Trade-off of Iterative Optimization Algorithms
- Computer ScienceArXiv
- 2018
This paper shows that for any iterative algorithm at any iteration, the overall performance is lower bounded by the minimax statistical error over an appropriately chosen loss function class and provides stability upper bounds for the quadratic loss function.
Private Stochastic Convex Optimization with Optimal Rates
- Computer ScienceNeurIPS
- 2019
The approach builds on existing differentially private algorithms and relies on the analysis of algorithmic stability to ensure generalization and implies that, contrary to intuition based on private ERM, private SCO has asymptotically the same rate of $1/\sqrt{n}$ as non-private SCO in the parameter regime most common in practice.
Generalization Bounds for Uniformly Stable Algorithms
- Computer Science, MathematicsNeurIPS
- 2018
A tight bound of $O(\gamma^2 + 1/n)$ on the second moment of the generalization error is proved and these results imply substantially stronger generalization guarantees for several well-studied algorithms.
Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds
- Computer Science2014 IEEE 55th Annual Symposium on Foundations of Computer Science
- 2014
This work provides new algorithms and matching lower bounds for differentially private convex empirical risk minimization assuming only that each data point's contribution to the loss function is Lipschitz and that the domain of optimization is bounded.
Private stochastic convex optimization: optimal rates in linear time
- Computer ScienceSTOC
- 2020
Two new techniques for deriving DP convex optimization algorithms both achieving the optimal bound on excess loss and using O(min{n, n 2/d}) gradient computations are described.
(Near) Dimension Independent Risk Bounds for Differentially Private Learning
- Computer ScienceICML
- 2014
This paper shows that under certain assumptions, variants of both output and objective perturbation algorithms have no explicit dependence on p; the excess risk depends only on the L2-norm of the true risk minimizer and that of training points.
Private Convex Empirical Risk Minimization and High-dimensional Regression
- Computer Science, MathematicsCOLT 2012
- 2012
This work significantly extends the analysis of the “objective perturbation” algorithm of Chaudhuri et al. (2011) for convex ERM problems, and gives the best known algorithms for differentially private linear regression.
Sharper bounds for uniformly stable algorithms
- Computer Science, MathematicsCOLT
- 2020
A short proof of the moment bound that implies the generalization bound stronger than both recent results is provided, and a general concentration inequality for weakly correlated random variables is proved, which may be of independent interest.