• Corpus ID: 126146724

@inproceedings{Li2019SSRGDSS,
author={Zhize Li},
booktitle={NeurIPS},
year={2019}
}
• Zhize Li
• Published in NeurIPS 19 April 2019
• Computer Science
We analyze stochastic gradient algorithms for optimizing nonconvex problems. [] Key Result Besides, we also extend our results from nonconvex finite-sum problems to nonconvex online (expectation) problems, and prove the corresponding convergence results.

## Tables from this paper

### Escape saddle points faster on manifolds via perturbed Riemannian stochastic recursive gradient

• Computer Science, Mathematics
ArXiv
• 2020
A variant of Riemannian stochastic recursive gradient method that can achieve second-order convergence guarantee and escape saddle points using simple perturbation is proposed.

• Computer Science
NeurIPS
• 2021
The main contribution is an idea of implementing a robust Hessian power method using only gradients, which can find negative curvature near saddle points and achieve the polynomial speedup in log n compared to the perturbed gradient descent methods.

### A Hybrid Stochastic Optimization Framework for Stochastic Composite Nonconvex Optimization

• Computer Science
ArXiv
• 2019
The main idea is to combine two stochastic estimators to create a new hybrid one, which first introduces its hybrid estimator and then investigates its fundamental properties to form a foundational theory for algorithmic development.

### Simple and Optimal Stochastic Gradient Methods for Nonsmooth Nonconvex Optimization

• Computer Science
ArXiv
• 2022
It is proved that both ProxSVRG+ and SSRGD enjoy automatic adaptation with local structure of the objective function such as the Polyak-Łojasiewicz (PL) condition for nonconvex functions in the finite-sum case, i.e., both can automatically switch to faster global linear convergence without any restart performed in prior work Prox SVRG (Reddi et al., 2016b).

### SpiderBoost and Momentum: Faster Stochastic Variance Reduction Algorithms

• Zhe Wang
• Computer Science
• 2018
This paper proposes SpiderBoost as an improved scheme, which allows to use a much larger constant-level stepsize while maintaining the same near-optimal oracle complexity, and can be extended with proximal mapping to handle composite optimization (which is nonsmooth and nonconvex) with provable convergence guarantee.

### Finding Second-Order Stationary Points Efficiently in Smooth Nonconvex Linearly Constrained Optimization Problems

• Computer Science
NeurIPS
• 2020
This paper proposes two efficient algorithms for computing approximate secondorder stationary points (SOSPs) of problems with generic smooth non-convex objective functions and generic linear constraints and shows that generic problem instances in this class can be solved efficiently.

### Faster Perturbed Stochastic Gradient Methods for Finding Local Minima

• Computer Science, Mathematics
ALT
• 2022
The core idea of the framework is a step-size shrinkage scheme to control the average movement of the iterates, which leads to faster convergence to the local minima.

### Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning

• Computer Science
ArXiv
• 2022
A new local algorithm called Bias-Variance Reduced Local Perturbed SGD (BVR-L-PSGD), that combines the existing bias-variance reduced gradient estimator with parameter perturbation to achieve second-order optimal points in centralized nonconvex distributed optimization.

### A Fast Anderson-Chebyshev Acceleration for Nonlinear Optimization

• Computer Science
AISTATS
• 2020
It is shown that Anderson acceleration with Chebyshev polynomial can achieve the optimal convergence rate, which improves the previous result $O(\kappa\ln\frac{1}{\epsilon})$ provided by (Toth and Kelley, 2015) for quadratic functions.

### Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization

• Computer Science
ArXiv
• 2022
Two variants of the Stochastic Variance Reduced Gradient Langevin Dynamics are studied and it is proved their convergence to the objective distribution in terms of KL-divergence under the sole assumptions of smoothness and Log-Sobolev inequality.

## References

SHOWING 1-10 OF 36 REFERENCES

### Stochastic Nested Variance Reduction for Nonconvex Optimization

• Computer Science
J. Mach. Learn. Res.
• 2020
A new stochastic gradient descent algorithm based on nested variance reduction that improves the best known gradient complexity of SVRG and SCSG and achieves better gradient complexity than the state-of-the-art algorithms.

### Stabilized SVRG: Simple Variance Reduction for Nonconvex Optimization

• Computer Science
COLT
• 2019
It is shown that Stabilized SVRG (a simple variant of SVRGs) can find an $\epsilon$-second-order stationary point using only $\widetilde{O}(n^{2/3}/\ep silon^2+n/\Epsilon^{1.5})$ stochastic gradients, which is the first second-order guarantee for asimple variant ofSVRG.

### Finding Local Minima via Stochastic Nested Variance Reduction

• Computer Science
ArXiv
• 2018
Two algorithms that can find local minima faster than the state-of-the-art algorithms in both finite-sum and general stochastic nonconvex optimization are proposed and the acceleration brought by third-order smoothness of the objective function is explored.

### Non-convex Finite-Sum Optimization Via SCSG Methods

• Computer Science
NIPS
• 2017
A class of algorithms, as variants of the stochastically controlled stochastic gradient methods (SCSG) methods, for the smooth non-convex finite-sum optimization problem, which demonstrates that SCSG outperforms stochastics gradient methods on training multi-layers neural networks in terms of both training and validation loss.

### SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator

• Computer Science, Mathematics
NeurIPS
• 2018
This paper proposes a new technique named SPIDER, which can be used to track many deterministic quantities of interest with significantly reduced computational cost and proves that SPIDER-SFO nearly matches the algorithmic lower bound for finding approximate first-order stationary points under the gradient Lipschitz assumption in the finite-sum setting.

### Accelerated Methods for Non-Convex Optimization

• Computer Science
ArXiv
• 2016
The method improves upon the complexity of gradient descent and provides the additional second-order guarantee that $\nabla^2 f(x) \succeq -O(\epsilon^{1/2})I$ for the computed $x$.

### Variance Reduction for Faster Non-Convex Optimization

• Computer Science
ICML
• 2016
This work considers the fundamental problem in non-convex optimization of efficiently reaching a stationary point, and proposes a first-order minibatch stochastic method that converges with an $O(1/\varepsilon)$ rate, and is faster than full gradient descent by $\Omega(n^{1/3})$.

### A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization

• Computer Science
NeurIPS
• 2018
This work proposes a proximal stochastic gradient algorithm based on variance reduction, called ProxSVRG+, which generalizes the best results given by the SCSG algorithm and achieves global linear convergence rate without restart.

### Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition

• Computer Science, Mathematics
COLT
• 2015
This paper identifies strict saddle property for non-convex problem that allows for efficient optimization of orthogonal tensor decomposition, and shows that stochastic gradient descent converges to a local minimum in a polynomial number of iterations.

### SpiderBoost and Momentum: Faster Stochastic Variance Reduction Algorithms

• Zhe Wang
• Computer Science
• 2018
This paper proposes SpiderBoost as an improved scheme, which allows to use a much larger constant-level stepsize while maintaining the same near-optimal oracle complexity, and can be extended with proximal mapping to handle composite optimization (which is nonsmooth and nonconvex) with provable convergence guarantee.