• Corpus ID: 126146724

SSRGD: Simple Stochastic Recursive Gradient Descent for Escaping Saddle Points

@inproceedings{Li2019SSRGDSS,
  title={SSRGD: Simple Stochastic Recursive Gradient Descent for Escaping Saddle Points},
  author={Zhize Li},
  booktitle={NeurIPS},
  year={2019}
}
  • Zhize Li
  • Published in NeurIPS 19 April 2019
  • Computer Science
We analyze stochastic gradient algorithms for optimizing nonconvex problems. [] Key Result Besides, we also extend our results from nonconvex finite-sum problems to nonconvex online (expectation) problems, and prove the corresponding convergence results.

Tables from this paper

Escape saddle points faster on manifolds via perturbed Riemannian stochastic recursive gradient

TLDR
A variant of Riemannian stochastic recursive gradient method that can achieve second-order convergence guarantee and escape saddle points using simple perturbation is proposed.

Escape saddle points by a simple gradient-descent based algorithm

TLDR
The main contribution is an idea of implementing a robust Hessian power method using only gradients, which can find negative curvature near saddle points and achieve the polynomial speedup in log n compared to the perturbed gradient descent methods.

A Hybrid Stochastic Optimization Framework for Stochastic Composite Nonconvex Optimization

TLDR
The main idea is to combine two stochastic estimators to create a new hybrid one, which first introduces its hybrid estimator and then investigates its fundamental properties to form a foundational theory for algorithmic development.

Simple and Optimal Stochastic Gradient Methods for Nonsmooth Nonconvex Optimization

TLDR
It is proved that both ProxSVRG+ and SSRGD enjoy automatic adaptation with local structure of the objective function such as the Polyak-Łojasiewicz (PL) condition for nonconvex functions in the finite-sum case, i.e., both can automatically switch to faster global linear convergence without any restart performed in prior work Prox SVRG (Reddi et al., 2016b).

SpiderBoost and Momentum: Faster Stochastic Variance Reduction Algorithms

TLDR
This paper proposes SpiderBoost as an improved scheme, which allows to use a much larger constant-level stepsize while maintaining the same near-optimal oracle complexity, and can be extended with proximal mapping to handle composite optimization (which is nonsmooth and nonconvex) with provable convergence guarantee.

Finding Second-Order Stationary Points Efficiently in Smooth Nonconvex Linearly Constrained Optimization Problems

TLDR
This paper proposes two efficient algorithms for computing approximate secondorder stationary points (SOSPs) of problems with generic smooth non-convex objective functions and generic linear constraints and shows that generic problem instances in this class can be solved efficiently.

Faster Perturbed Stochastic Gradient Methods for Finding Local Minima

TLDR
The core idea of the framework is a step-size shrinkage scheme to control the average movement of the iterates, which leads to faster convergence to the local minima.

Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning

TLDR
A new local algorithm called Bias-Variance Reduced Local Perturbed SGD (BVR-L-PSGD), that combines the existing bias-variance reduced gradient estimator with parameter perturbation to achieve second-order optimal points in centralized nonconvex distributed optimization.

A Fast Anderson-Chebyshev Acceleration for Nonlinear Optimization

TLDR
It is shown that Anderson acceleration with Chebyshev polynomial can achieve the optimal convergence rate, which improves the previous result $O(\kappa\ln\frac{1}{\epsilon})$ provided by (Toth and Kelley, 2015) for quadratic functions.

Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization

TLDR
Two variants of the Stochastic Variance Reduced Gradient Langevin Dynamics are studied and it is proved their convergence to the objective distribution in terms of KL-divergence under the sole assumptions of smoothness and Log-Sobolev inequality.

References

SHOWING 1-10 OF 36 REFERENCES

Stochastic Nested Variance Reduction for Nonconvex Optimization

TLDR
A new stochastic gradient descent algorithm based on nested variance reduction that improves the best known gradient complexity of SVRG and SCSG and achieves better gradient complexity than the state-of-the-art algorithms.

Stabilized SVRG: Simple Variance Reduction for Nonconvex Optimization

TLDR
It is shown that Stabilized SVRG (a simple variant of SVRGs) can find an $\epsilon$-second-order stationary point using only $\widetilde{O}(n^{2/3}/\ep silon^2+n/\Epsilon^{1.5})$ stochastic gradients, which is the first second-order guarantee for asimple variant ofSVRG.

Finding Local Minima via Stochastic Nested Variance Reduction

TLDR
Two algorithms that can find local minima faster than the state-of-the-art algorithms in both finite-sum and general stochastic nonconvex optimization are proposed and the acceleration brought by third-order smoothness of the objective function is explored.

Non-convex Finite-Sum Optimization Via SCSG Methods

TLDR
A class of algorithms, as variants of the stochastically controlled stochastic gradient methods (SCSG) methods, for the smooth non-convex finite-sum optimization problem, which demonstrates that SCSG outperforms stochastics gradient methods on training multi-layers neural networks in terms of both training and validation loss.

SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator

TLDR
This paper proposes a new technique named SPIDER, which can be used to track many deterministic quantities of interest with significantly reduced computational cost and proves that SPIDER-SFO nearly matches the algorithmic lower bound for finding approximate first-order stationary points under the gradient Lipschitz assumption in the finite-sum setting.

Accelerated Methods for Non-Convex Optimization

TLDR
The method improves upon the complexity of gradient descent and provides the additional second-order guarantee that $\nabla^2 f(x) \succeq -O(\epsilon^{1/2})I$ for the computed $x$.

Variance Reduction for Faster Non-Convex Optimization

TLDR
This work considers the fundamental problem in non-convex optimization of efficiently reaching a stationary point, and proposes a first-order minibatch stochastic method that converges with an $O(1/\varepsilon)$ rate, and is faster than full gradient descent by $\Omega(n^{1/3})$.

A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization

TLDR
This work proposes a proximal stochastic gradient algorithm based on variance reduction, called ProxSVRG+, which generalizes the best results given by the SCSG algorithm and achieves global linear convergence rate without restart.

Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition

TLDR
This paper identifies strict saddle property for non-convex problem that allows for efficient optimization of orthogonal tensor decomposition, and shows that stochastic gradient descent converges to a local minimum in a polynomial number of iterations.

SpiderBoost and Momentum: Faster Stochastic Variance Reduction Algorithms

TLDR
This paper proposes SpiderBoost as an improved scheme, which allows to use a much larger constant-level stepsize while maintaining the same near-optimal oracle complexity, and can be extended with proximal mapping to handle composite optimization (which is nonsmooth and nonconvex) with provable convergence guarantee.