• Corpus ID: 219636105

Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses

  title={Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses},
  author={Raef Bassily and Vitaly Feldman and Crist'obal Guzm'an and Kunal Talwar},
Uniform stability is a notion of algorithmic stability that bounds the worst case change in the model output by the algorithm when a single data point in the dataset is replaced. An influential work of Hardt et al. (2016) provides strong upper bounds on the uniform stability of the stochastic gradient descent (SGD) algorithm on sufficiently smooth convex losses. These results led to important progress in understanding of the generalization properties of SGD and several applications to… 

Tables from this paper

Beyond Lipschitz: Sharp Generalization and Excess Risk Bounds for Full-Batch GD

This key result shows that small generalization error occurs at stationary points, and allows us to bypass Lipschitz or sub-Gaussian assumptions on the loss prevalent in previous works.

Stability and Generalization of Stochastic Gradient Methods for Minimax Problems

This paper provides a comprehensive generalization analysis of stochastic gradient methods for minimax problems under both convex-concave and nonconvex-nonconCave cases through the lens of algorithmic stability, and establishes a quantitative connection between stability and several generalization measures both in expectation and with high probability.

Stability of SGD: Tightness Analysis and Improved Bounds

Stochastic Gradient Descent based methods are used for training large-scale machine learning models that generalize well in practice, but there are no known examples of smooth loss functions for which the analysis can be shown to be tight, so open questions regarding tightness of bounds in the data-independent setting are settled.

Stability and Generalization of Stochastic Optimization with Nonconvex and Nonsmooth Problems

A systematic stability and generalization analysis of stochastic optimization on nonconvex and nonsmooth problems is initialized and a class of sampling-determined algorithms are introduced, for which bounds for three stability measures are developed.

Differentially private SGD with non-smooth losses

High Probability Guarantees for Nonconvex Stochastic Gradient Descent with Heavy Tails

This paper develops high probability bounds for nonconvex SGD with a joint perspective of optimization and generalization performance, and shows that gradient clipping can be employed to remove the bounded gradient-type assumptions.

Stability Based Generalization Bounds for Exponential Family Langevin Dynamics

Exonential Family Langevin Dynamics (EFLD), a substantial generalization of SGLD, is introduced, which includes noisy versions of Sign-SGD and quantized SGD as special cases and optimization guarantees for special cases of EFLD are established.

Differentially Private SGDA for Minimax Problems

This paper proves that the DP-SGDA can achieve an optimal utility rate in terms of the weak primal-dual population risk in both smooth and non-smooth cases, and provides its utility analysis in the nonconvex-strongly-concave setting.

Differentially Private SGD with Non-Smooth Loss

It is proved that noisy SGD with alpha-Holder smoothness using gradient perturbation can guarantee $(epsilon,elta)$-differential privacy (DP) and attain optimal excess population risk with linear gradient complexity T = O(n).

Stability and Generalization for Markov Chain Stochastic Gradient Methods

This paper provides a comprehensive generalization analysis of MC-SGMs for both minimization and minimax problems through the lens of algorithmic stability in the framework of statistical learning theory and develops the nearly optimal convergence rates for convex-concave problems.



Data-Dependent Stability of Stochastic Gradient Descent

A data-dependent notion of algorithmic stability for Stochastic Gradient Descent is established, and novel generalization bounds are developed that exhibit fast convergence rates for SGD subject to a vanishing empirical risk and low noise of stochastic gradient.

Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent

This paper introduces a new stability measure called on-average model stability, for which novel bounds controlled by the risks of SGD iterates are developed, which gives the first-ever-known stability and generalization bounds for SGD with even non-differentiable loss functions.

Stability and Generalization of Learning Algorithms that Converge to Global Optima

This work derives black-box stability results that only depend on the convergence of a learning algorithm and the geometry around the minimizers of the loss function that establish novel generalization bounds for learning algorithms that converge to global minima.

Stability and Convergence Trade-off of Iterative Optimization Algorithms

This paper shows that for any iterative algorithm at any iteration, the overall performance is lower bounded by the minimax statistical error over an appropriately chosen loss function class and provides stability upper bounds for the quadratic loss function.

Train faster, generalize better: Stability of stochastic gradient descent

We show that parametric models trained by a stochastic gradient method (SGM) with few iterations have vanishing generalization error. We prove our results by arguing that SGM is algorithmically

Generalization Bounds for Uniformly Stable Algorithms

A tight bound of $O(\gamma^2 + 1/n)$ on the second moment of the generalization error is proved and these results imply substantially stronger generalization guarantees for several well-studied algorithms.

Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds

This work provides new algorithms and matching lower bounds for differentially private convex empirical risk minimization assuming only that each data point's contribution to the loss function is Lipschitz and that the domain of optimization is bounded.

Private stochastic convex optimization: optimal rates in linear time

Two new techniques for deriving DP convex optimization algorithms both achieving the optimal bound on excess loss and using O(min{n, n 2/d}) gradient computations are described.

(Near) Dimension Independent Risk Bounds for Differentially Private Learning

This paper shows that under certain assumptions, variants of both output and objective perturbation algorithms have no explicit dependence on p; the excess risk depends only on the L2-norm of the true risk minimizer and that of training points.

Private Convex Empirical Risk Minimization and High-dimensional Regression

This work significantly extends the analysis of the “objective perturbation” algorithm of Chaudhuri et al. (2011) for convex ERM problems, and gives the best known algorithms for differentially private linear regression.