Sub-linear convergence of a tamed stochastic gradient descent method in Hilbert space

  title={Sub-linear convergence of a tamed stochastic gradient descent method in Hilbert space},
  author={Monika Eisenmann and Tony Stillfjord},
  journal={SIAM J. Optim.},
In this paper, we introduce the tamed stochastic gradient descent method (TSGD) for optimization problems. Inspired by the tamed Euler scheme, which is a commonly used method within the context of stochastic differential equations, TSGD is an explicit scheme that exhibits stability properties similar to those of implicit schemes. As its computational cost is essentially equivalent to that of the well-known stochastic gradient descent method (SGD), it constitutes a very competitive alternative… 

Figures from this paper



Sub-linear convergence of a stochastic proximal iteration method in Hilbert space

A stochastic version of the proximal point algorithm for optimization problems posed on a Hilbert space is considered, which allows it to prove convergence with an (optimal) sub-linear rate also in an infinite-dimensional setting.

Nonasymptotic convergence of stochastic proximal point methods for constrained convex optimization

This work introduces a new variant of the SPP method for solving stochastic convex problems subject to (in)finite intersection of constraints satisfying a linear regularity condition, and proves new nonasymptotic convergence results for convex Lipschitz continuous objective functions.

Taming neural networks with TUSLA: Non-convex learning via adaptive stochastic gradient Langevin algorithms

This work offers a new learning algorithm based on an appropriately constructed variant of the popular stochastic gradient Langevin dynamics (SGLD), which is called tamed unadjusted Stochastic Langevin algorithm (TUSLA), and provides finite-time guarantees for TUSLA to find approximate minimizers of both empirical and population risks.

Dynamical Behavior of a Stochastic Forward–Backward Algorithm Using Random Monotone Operators

It is shown that with probability one, the interpolated process obtained from the iterates is an asymptotic pseudotrajectory in the sense of Benaïm and Hirsch of the differential inclusion involving the sum of the mean operators.

Towards Stability and Optimality in Stochastic Gradient Descent

A new iterative procedure termed averaged implicit SGD (AI-SGD), which employs an implicit update at each iteration, which is related to proximal operators in optimization and achieves competitive performance with other state-of-the-art procedures.

Adam: A Method for Stochastic Optimization

This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

Explicit stabilised gradient descent for faster strongly convex optimisation

It is proved that RKCD nearly achieves the optimal convergence rate of the conjugate gradient algorithm, and the suboptimality of RK CD diminishes as the condition number of the quadratic function worsens.

Proximal-Proximal-Gradient Method

A key strength of PPG and S-PPG is its ability to directly handle a large sum of non-differentiable non-separable functions with a constant stepsize independent of the number of functions.

Asymptotic and finite-sample properties of estimators based on stochastic gradients

The theoretical analysis provides the first full characterization of the asymptotic behavior of both standard and implicit stochastic gradient descent-based estimators, including finite-sample error bounds, and suggests that implicit stochy gradient descent procedures are poised to become a workhorse for approximate inference from large data sets.

A note on tamed Euler approximations

Strong convergence results on tamed Euler schemes, which approximate stochastic differential equations with superlinearly growing drift coefficients that are locally one-sided Lipschitz continuous,