• Corpus ID: 219636105

Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses

  title={Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses},
  author={Raef Bassily and Vitaly Feldman and Crist'obal Guzm'an and Kunal Talwar},
Uniform stability is a notion of algorithmic stability that bounds the worst case change in the model output by the algorithm when a single data point in the dataset is replaced. An influential work of Hardt et al. (2016) provides strong upper bounds on the uniform stability of the stochastic gradient descent (SGD) algorithm on sufficiently smooth convex losses. These results led to important progress in understanding of the generalization properties of SGD and several applications to… 

Tables from this paper

Stability and Generalization of Stochastic Gradient Methods for Minimax Problems

This paper provides a comprehensive generalization analysis of stochastic gradient methods for minimax problems under both convex-concave and nonconvex-nonconCave cases through the lens of algorithmic stability, and establishes a quantitative connection between stability and several generalization measures both in expectation and with high probability.

Stability of SGD: Tightness Analysis and Improved Bounds

Stochastic Gradient Descent based methods are used for training large-scale machine learning models that generalize well in practice, but there are no known examples of smooth loss functions for which the analysis can be shown to be tight, so open questions regarding tightness of bounds in the data-independent setting are settled.

Differentially private SGD with non-smooth losses

High Probability Guarantees for Nonconvex Stochastic Gradient Descent with Heavy Tails

This paper develops high probability bounds for nonconvex SGD with a joint perspective of optimization and generalization performance, and shows that gradient clipping can be employed to remove the bounded gradient-type assumptions.

Sharper Analysis for Minibatch Stochastic Proximal Point Methods: Stability, Smoothness, and Deviation

This article derives a near-tight high probability bound on the parameter estimation error of a sampling-without-replacement variant of M-SPP, which substantially improves the best known results of SPP-type approaches by revealing the impact of noise level of model on convergence rate.

Stability Based Generalization Bounds for Exponential Family Langevin Dynamics

Exonential Family Langevin Dynamics (EFLD), a substantial generalization of SGLD, is introduced, which includes noisy versions of Sign-SGD and quantized SGD as special cases and optimization guarantees for special cases of EFLD are established.

Private Stochastic Optimization With Large Worst-Case Lipschitz Parameter: Optimal Rates for (Non-Smooth) Convex Losses and Extension to Non-Convex Losses

For convex and strongly convex loss functions, this work provides the first asymptotically optimal excess risk bounds (up to a logarithmic factor) and is the first to address non-convex non-uniformly Lipschitz loss functions satisfying the Proximal-PL inequality.

Differentially Private SGDA for Minimax Problems

This paper proves that the DP-SGDA can achieve an optimal utility rate in terms of the weak primal-dual population risk in both smooth and non-smooth cases, and provides its utility analysis in the nonconvex-strongly-concave setting.

Differentially Private SGD with Non-Smooth Loss

It is proved that noisy SGD with alpha-Holder smoothness using gradient perturbation can guarantee $(epsilon,elta)$-differential privacy (DP) and attain optimal excess population risk with linear gradient complexity T = O(n).

Stability and Generalization for Markov Chain Stochastic Gradient Methods

This paper provides a comprehensive generalization analysis of MC-SGMs for both minimization and minimax problems through the lens of algorithmic stability in the framework of statistical learning theory and develops the nearly optimal convergence rates for convex-concave problems.



Data-Dependent Stability of Stochastic Gradient Descent

A data-dependent notion of algorithmic stability for Stochastic Gradient Descent is established, and novel generalization bounds are developed that exhibit fast convergence rates for SGD subject to a vanishing empirical risk and low noise of stochastic gradient.

Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent

This paper introduces a new stability measure called on-average model stability, for which novel bounds controlled by the risks of SGD iterates are developed, which gives the first-ever-known stability and generalization bounds for SGD with even non-differentiable loss functions.

Stability and Convergence Trade-off of Iterative Optimization Algorithms

This paper shows that for any iterative algorithm at any iteration, the overall performance is lower bounded by the minimax statistical error over an appropriately chosen loss function class and provides stability upper bounds for the quadratic loss function.

Private Stochastic Convex Optimization with Optimal Rates

The approach builds on existing differentially private algorithms and relies on the analysis of algorithmic stability to ensure generalization and implies that, contrary to intuition based on private ERM, private SCO has asymptotically the same rate of $1/\sqrt{n}$ as non-private SCO in the parameter regime most common in practice.

Generalization Bounds for Uniformly Stable Algorithms

A tight bound of $O(\gamma^2 + 1/n)$ on the second moment of the generalization error is proved and these results imply substantially stronger generalization guarantees for several well-studied algorithms.

Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds

This work provides new algorithms and matching lower bounds for differentially private convex empirical risk minimization assuming only that each data point's contribution to the loss function is Lipschitz and that the domain of optimization is bounded.

Private stochastic convex optimization: optimal rates in linear time

Two new techniques for deriving DP convex optimization algorithms both achieving the optimal bound on excess loss and using O(min{n, n 2/d}) gradient computations are described.

(Near) Dimension Independent Risk Bounds for Differentially Private Learning

This paper shows that under certain assumptions, variants of both output and objective perturbation algorithms have no explicit dependence on p; the excess risk depends only on the L2-norm of the true risk minimizer and that of training points.

Private Convex Empirical Risk Minimization and High-dimensional Regression

This work significantly extends the analysis of the “objective perturbation” algorithm of Chaudhuri et al. (2011) for convex ERM problems, and gives the best known algorithms for differentially private linear regression.

Sharper bounds for uniformly stable algorithms

A short proof of the moment bound that implies the generalization bound stronger than both recent results is provided, and a general concentration inequality for weakly correlated random variables is proved, which may be of independent interest.