# The Complexity of Finding Stationary Points with Stochastic Gradient Descent

@inproceedings{Drori2020TheCO, title={The Complexity of Finding Stationary Points with Stochastic Gradient Descent}, author={Yoel Drori and Ohad Shamir}, booktitle={ICML}, year={2020} }

We study the iteration complexity of stochastic gradient descent (SGD) for minimizing the gradient norm of smooth, possibly nonconvex functions. We provide several results, implying that the classical $\mathcal{O}(\epsilon^{-4})$ upper bound (for making the average gradient norm less than $\epsilon$) cannot be improved upon, unless a combination of additional assumptions is made. Notably, this holds even if we limit ourselves to convex quadratic functions. We also show that for nonconvex…

## 31 Citations

Computing the Variance of Shuffling Stochastic Gradient Algorithms via Power Spectral Density Analysis

- Computer Science
- 2022

The analysis extends beyond SGD to SGD with momentum and to the stochastic Nesterov’s accelerated gradient method, and performs experiments on quadratic objective functions to test the validity of the approximation and the correctness of the findings.

Learning Halfspaces with Massart Noise Under Structured Distributions

- Computer Science, MathematicsCOLT 2020
- 2020

This work identifies a smooth {\em non-convex} surrogate loss with the property that any approximate stationary point of this loss defines a halfspace that is close to the target halfspace, and can be used to solve the underlying learning problem.

Adaptive Gradient Descent for Convex and Non-Convex Stochastic Optimization

- Computer Science
- 2019

These algorithms are based on Armijo-type line search and they simultaneously adapt to the unknown Lipschitz constant of the gradient and variance of the stochastic approximation for the gradient.

STOCHASTIC GRADIENT DESCENT

- Computer Science
- 2022

This paper develops a broad condition on the sequence of examples used by SGD that is sufficient to prove tight convergence rates in both strongly convex and non-convex settings, and proposes two new example-selection approaches using quasi-Monte-Carlo methods.

Branch-and-Bound Performance Estimation Programming: A Unified Methodology for Constructing Optimal Optimization Methods

- Computer Science
- 2022

The BnB-PEP methodology is applied to several setups for which the prior methodologies do not apply and obtain methods with bounds that improve upon prior state-of-the-art results, thereby systematically generating analytical convergence proofs.

Tight Convergence Rates of the GradientMethod on Hypoconvex Functions

- Mathematics
- 2022

We perform the first tight convergence analysis of the gradient method with fixed step sizes applied to the class of smooth hypoconvex (weakly-convex) functions, i.e., smooth nonconvex functions…

Tight convergence rates of the gradient method on smooth hypoconvex functions

- Mathematics
- 2022

We perform the first tight convergence analysis of the gradient method with varying step sizes when applied to smooth hypoconvex (weakly convex) functions. Hypoconvex functions are smooth nonconvex…

Latency considerations for stochastic optimizers in variational quantum algorithms

- Computer Science
- 2022

Stochastic optimization algorithms that yield stochastic processes em-ulating the dynamics of classical deterministic algorithms results in methods with theoretically superior worst-case iteration complexities, at the expense of greater per-iteration sample (shot) complexities.

Exact Optimal Accelerated Complexity for Fixed-Point Iterations

- MathematicsICML
- 2022

Despite the broad use of ﬁxed-point iterations throughout applied mathematics, the optimal convergence rate of general ﬁxed-point problems with nonexpansive nonlinear operators has not been…

A novel Gray-Scale spatial exploitation learning Net for COVID-19 by crawling Internet resources

- Computer ScienceBiomedical Signal Processing and Control
- 2022

## References

SHOWING 1-10 OF 27 REFERENCES

Sharp Analysis for Nonconvex SGD Escaping from Saddle Points

- Computer ScienceCOLT
- 2019

A sharp analysis for Stochastic Gradient Descent is given and it is proved that SGD is able to efficiently escape from saddle points and find an approximate second-order stationary point in $\tilde{O}(\epsilon^{-3.5}))$ stochastic gradient computations for generic nonconvex optimization problems, when the objective function satisfies gradient-Lipschitz, Hessian-Lipitz, and dispersive noise assumptions.

How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD

- Computer ScienceNeurIPS
- 2018

If $f(x)$ is convex, to find its $\varepsilon$-approximate local minimum, the original SGD does not give an optimal rate, so this work designs an algorithm SGD3 with a near-optimal rate, improving the best known rate $O(\varpsilon^{-8/3})$.

Lower Bounds for Non-Convex Stochastic Optimization

- Computer Science, MathematicsArXiv
- 2019

It is proved that (in the worst case) any algorithm requires at least $\epsilon^{-4}$ queries to find an stationary point, and establishes that stochastic gradient descent is minimax optimal in this model.

Stochastic Approximation and Recursive Algorithms and Applications

- Mathematics
- 2003

Introduction 1 Review of Continuous Time Models 1.1 Martingales and Martingale Inequalities 1.2 Stochastic Integration 1.3 Stochastic Differential Equations: Diffusions 1.4 Reflected Diffusions 1.5…

The Complexity of Making the Gradient Small in Stochastic Convex Optimization

- Computer ScienceCOLT
- 2019

It is shown that in the global oracle/statistical learning model, only logarithmic dependence on smoothness is required to find a near-stationary point, whereas polynomial dependence on Smoothness is necessary in the local stochastic oracle model.

Convergence and efficiency of subgradient methods for quasiconvex minimization

- MathematicsMath. Program.
- 2001

The general subgradient projection method for minimizing a quasiconvex objective subject to a convex set constraint in a Hilbert space is studied, finding ε-solutions with an efficiency estimate of O(ε-2), thus being optimal in the sense of Nemirovskii.

Introductory Lectures on Convex Optimization - A Basic Course

- Computer ScienceApplied Optimization
- 2004

It was in the middle of the 1980s, when the seminal paper by Kar markar opened a new epoch in nonlinear optimization, and it became more and more common that the new methods were provided with a complexity analysis, which was considered a better justification of their efficiency than computational experiments.

On the Gap Between Strict-Saddles and True Convexity: An Omega(log d) Lower Bound for Eigenvector Approximation

- Computer Science, MathematicsArXiv
- 2017

A lower bound on query complexity on rank-one principal component analysis (PCA) is proved by developing a "truncated" analogue of the $\chi^2$ Bayes-risk lower bound of Chen et al.