# SSRGD: Simple Stochastic Recursive Gradient Descent for Escaping Saddle Points

@inproceedings{Li2019SSRGDSS, title={SSRGD: Simple Stochastic Recursive Gradient Descent for Escaping Saddle Points}, author={Zhize Li}, booktitle={NeurIPS}, year={2019} }

We analyze stochastic gradient algorithms for optimizing nonconvex problems. [... ] Key Result Besides, we also extend our results from nonconvex finite-sum problems to nonconvex online (expectation) problems, and prove the corresponding convergence results. Expand

## Tables from this paper

## 26 Citations

### Escape saddle points faster on manifolds via perturbed Riemannian stochastic recursive gradient

- Computer Science, MathematicsArXiv
- 2020

A variant of Riemannian stochastic recursive gradient method that can achieve second-order convergence guarantee and escape saddle points using simple perturbation is proposed.

### Escape saddle points by a simple gradient-descent based algorithm

- Computer ScienceNeurIPS
- 2021

The main contribution is an idea of implementing a robust Hessian power method using only gradients, which can find negative curvature near saddle points and achieve the polynomial speedup in log n compared to the perturbed gradient descent methods.

### A Hybrid Stochastic Optimization Framework for Stochastic Composite Nonconvex Optimization

- Computer ScienceArXiv
- 2019

The main idea is to combine two stochastic estimators to create a new hybrid one, which first introduces its hybrid estimator and then investigates its fundamental properties to form a foundational theory for algorithmic development.

### Simple and Optimal Stochastic Gradient Methods for Nonsmooth Nonconvex Optimization

- Computer ScienceArXiv
- 2022

It is proved that both ProxSVRG+ and SSRGD enjoy automatic adaptation with local structure of the objective function such as the Polyak-Łojasiewicz (PL) condition for nonconvex functions in the finite-sum case, i.e., both can automatically switch to faster global linear convergence without any restart performed in prior work Prox SVRG (Reddi et al., 2016b).

### SpiderBoost and Momentum: Faster Stochastic Variance Reduction Algorithms

- Computer Science
- 2018

This paper proposes SpiderBoost as an improved scheme, which allows to use a much larger constant-level stepsize while maintaining the same near-optimal oracle complexity, and can be extended with proximal mapping to handle composite optimization (which is nonsmooth and nonconvex) with provable convergence guarantee.

### Finding Second-Order Stationary Points Efficiently in Smooth Nonconvex Linearly Constrained Optimization Problems

- Computer ScienceNeurIPS
- 2020

This paper proposes two efficient algorithms for computing approximate secondorder stationary points (SOSPs) of problems with generic smooth non-convex objective functions and generic linear constraints and shows that generic problem instances in this class can be solved efficiently.

### Faster Perturbed Stochastic Gradient Methods for Finding Local Minima

- Computer Science, MathematicsALT
- 2022

The core idea of the framework is a step-size shrinkage scheme to control the average movement of the iterates, which leads to faster convergence to the local minima.

### Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning

- Computer ScienceArXiv
- 2022

A new local algorithm called Bias-Variance Reduced Local Perturbed SGD (BVR-L-PSGD), that combines the existing bias-variance reduced gradient estimator with parameter perturbation to achieve second-order optimal points in centralized nonconvex distributed optimization.

### A Fast Anderson-Chebyshev Acceleration for Nonlinear Optimization

- Computer ScienceAISTATS
- 2020

It is shown that Anderson acceleration with Chebyshev polynomial can achieve the optimal convergence rate, which improves the previous result $O(\kappa\ln\frac{1}{\epsilon})$ provided by (Toth and Kelley, 2015) for quadratic functions.

### Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization

- Computer ScienceArXiv
- 2022

Two variants of the Stochastic Variance Reduced Gradient Langevin Dynamics are studied and it is proved their convergence to the objective distribution in terms of KL-divergence under the sole assumptions of smoothness and Log-Sobolev inequality.

## References

SHOWING 1-10 OF 36 REFERENCES

### Stochastic Nested Variance Reduction for Nonconvex Optimization

- Computer ScienceJ. Mach. Learn. Res.
- 2020

A new stochastic gradient descent algorithm based on nested variance reduction that improves the best known gradient complexity of SVRG and SCSG and achieves better gradient complexity than the state-of-the-art algorithms.

### Stabilized SVRG: Simple Variance Reduction for Nonconvex Optimization

- Computer ScienceCOLT
- 2019

It is shown that Stabilized SVRG (a simple variant of SVRGs) can find an $\epsilon$-second-order stationary point using only $\widetilde{O}(n^{2/3}/\ep silon^2+n/\Epsilon^{1.5})$ stochastic gradients, which is the first second-order guarantee for asimple variant ofSVRG.

### Finding Local Minima via Stochastic Nested Variance Reduction

- Computer ScienceArXiv
- 2018

Two algorithms that can find local minima faster than the state-of-the-art algorithms in both finite-sum and general stochastic nonconvex optimization are proposed and the acceleration brought by third-order smoothness of the objective function is explored.

### Non-convex Finite-Sum Optimization Via SCSG Methods

- Computer ScienceNIPS
- 2017

A class of algorithms, as variants of the stochastically controlled stochastic gradient methods (SCSG) methods, for the smooth non-convex finite-sum optimization problem, which demonstrates that SCSG outperforms stochastics gradient methods on training multi-layers neural networks in terms of both training and validation loss.

### SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator

- Computer Science, MathematicsNeurIPS
- 2018

This paper proposes a new technique named SPIDER, which can be used to track many deterministic quantities of interest with significantly reduced computational cost and proves that SPIDER-SFO nearly matches the algorithmic lower bound for finding approximate first-order stationary points under the gradient Lipschitz assumption in the finite-sum setting.

### Accelerated Methods for Non-Convex Optimization

- Computer ScienceArXiv
- 2016

The method improves upon the complexity of gradient descent and provides the additional second-order guarantee that $\nabla^2 f(x) \succeq -O(\epsilon^{1/2})I$ for the computed $x$.

### Variance Reduction for Faster Non-Convex Optimization

- Computer ScienceICML
- 2016

This work considers the fundamental problem in non-convex optimization of efficiently reaching a stationary point, and proposes a first-order minibatch stochastic method that converges with an $O(1/\varepsilon)$ rate, and is faster than full gradient descent by $\Omega(n^{1/3})$.

### A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization

- Computer ScienceNeurIPS
- 2018

This work proposes a proximal stochastic gradient algorithm based on variance reduction, called ProxSVRG+, which generalizes the best results given by the SCSG algorithm and achieves global linear convergence rate without restart.

### Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition

- Computer Science, MathematicsCOLT
- 2015

This paper identifies strict saddle property for non-convex problem that allows for efficient optimization of orthogonal tensor decomposition, and shows that stochastic gradient descent converges to a local minimum in a polynomial number of iterations.

### SpiderBoost and Momentum: Faster Stochastic Variance Reduction Algorithms

- Computer Science
- 2018

This paper proposes SpiderBoost as an improved scheme, which allows to use a much larger constant-level stepsize while maintaining the same near-optimal oracle complexity, and can be extended with proximal mapping to handle composite optimization (which is nonsmooth and nonconvex) with provable convergence guarantee.