# Fast Convergence of Stochastic Gradient Descent under a Strong Growth Condition

@article{Schmidt2013FastCO, title={Fast Convergence of Stochastic Gradient Descent under a Strong Growth Condition}, author={Mark W. Schmidt and Nicolas Le Roux}, journal={arXiv: Optimization and Control}, year={2013} }

We consider optimizing a function smooth convex function $f$ that is the average of a set of differentiable functions $f_i$, under the assumption considered by Solodov [1998] and Tseng [1998] that the norm of each gradient $f_i'$ is bounded by a linear function of the norm of the average gradient $f'$. We show that under these assumptions the basic stochastic gradient method with a sufficiently-small constant step-size has an $O(1/k)$ convergence rate, and has a linear convergence rate if $g…

## 112 Citations

### On the linear convergence of the stochastic gradient method with constant step-size

- Computer Science, MathematicsOptim. Lett.
- 2019

This paper provides a necessary condition, for the linear convergence of SGM-CS, that is weaker than SGC, and shows that both the projected stochastic gradient method using a constant step-size, under the restricted strong convexity assumption, exhibit linear convergence to a noise dominated region.

### Towards Asymptotic Optimality with Conditioned Stochastic Gradient Descent

- Computer ScienceArXiv
- 2020

This paper investigates a general class of stochastic gradient descent algorithms, called conditioned SGD, based on a preconditioning of the gradient direction, and establishes the almost sure convergence and the asymptotic normality for a broad class of conditioning matrices.

### Linear Convergence of Adaptive Stochastic Gradient Descent

- Mathematics, Computer ScienceAISTATS
- 2020

We prove that the norm version of the adaptive stochastic gradient method (AdaGrad-Norm) achieves a linear convergence rate for a subset of either strongly convex functions or non-convex functions…

### Minimizing finite sums with the stochastic average gradient

- Computer ScienceMath. Program.
- 2017

Numerical experiments indicate that the new SAG method often dramatically outperforms existing SG and deterministic gradient methods, and that the performance may be further improved through the use of non-uniform sampling strategies.

### Stochastic Approximation of Smooth and Strongly Convex Functions: Beyond the O(1/T) Convergence Rate

- Computer ScienceCOLT
- 2019

This paper makes use of smoothness and strong convexity simultaneously to boost the convergence rate of SA by demonstrating that, in expectation, an O(1/2^{T/\kappa}+F_*) risk bound is achievable and obtaining a global linear convergence.

### General Convergence Analysis of Stochastic First-Order Methods for Composite Optimization

- Computer Science, MathematicsJ. Optim. Theory Appl.
- 2021

Stochastic composite convex optimization problems with the objective function satisfying a stochastic bounded gradient condition, with or without a quadratic functional growth property are considered, covering a large class of objective functions.

### On the linear convergence of the projected stochastic gradient method with constant step-size

- Mathematics, Computer Science
- 2017

It is shown that both PSGM-CS and the proximal stochastic gradient method exhibit linear convergence to a noise dominated region, whose distance to the optimal solution proportional to $\gamma \sigma$, when SGC is violated up to a additive perturbation.

### A delayed proximal gradient method with linear convergence rate

- Computer Science2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP)
- 2014

This paper derives an explicit expression that quantifies how the convergence rate depends on objective function properties and algorithm parameters such as step-size and the maximum delay, and reveals the trade-off between convergence speed and residual error.

### Unified Optimal Analysis of the (Stochastic) Gradient Method

- Computer Science, MathematicsArXiv
- 2019

This note gives a simple proof for the convergence of stochastic gradient methods on $\mu$-convex functions under a (milder than standard) $L$-smoothness assumption and recovers the exponential convergence rate.

### A globally convergent incremental Newton method

- Computer Science, MathematicsMath. Program.
- 2015

It is shown that the incremental Newton method for minimizing the sum of a large number of strongly convex functions is globally convergent for a variable stepsize rule and under a gradient growth condition, convergence rate is linear for both variable and constant stepsize rules.

## References

SHOWING 1-6 OF 6 REFERENCES

### Incremental Gradient Algorithms with Stepsizes Bounded Away from Zero

- Computer ScienceComput. Optim. Appl.
- 1998

The first convergence results of any kind for this computationally important case are derived and it is shown that a certain ε-approximate solution can be obtained and the linear dependence of ε on the stepsize limit is established.

### An Incremental Gradient(-Projection) Method with Momentum Term and Adaptive Stepsize Rule

- Computer ScienceSIAM J. Optim.
- 1998

We consider an incremental gradient method with momentum term for minimizing the sum of continuously differentiable functions. This method uses a new adaptive stepsize rule that decreases the…

### Robust Stochastic Approximation Approach to Stochastic Programming

- Computer Science, MathematicsSIAM J. Optim.
- 2009

It is intended to demonstrate that a properly modified SA approach can be competitive and even significantly outperform the SAA method for a certain class of convex stochastic problems.

### Introductory Lectures on Convex Optimization - A Basic Course

- Computer ScienceApplied Optimization
- 2004

It was in the middle of the 1980s, when the seminal paper by Kar markar opened a new epoch in nonlinear optimization, and it became more and more common that the new methods were provided with a complexity analysis, which was considered a better justification of their efficiency than computational experiments.