• Corpus ID: 247762779

@inproceedings{Guo2020EscapingSP,
author={Xin Guo and Jiequn Han and Mahan Tajrobehkar and Wenpin Tang},
year={2020}
}
• Published 9 May 2020
• Computer Science
Motivated by the super-diffusivity of self-repelling random walk, which has roots in statistical physics, this paper develops a new perturbation mechanism for optimization algorithms. In this mechanism, perturbations are adapted to the history of states via the notion of occupation time. After integrating this mechanism into the framework of perturbed gradient descent (PGD) and perturbed accelerated gradient descent (PAGD), two new algorithms are proposed: perturbed gradient descent adapted to…

References

SHOWING 1-10 OF 73 REFERENCES
• Computer Science
COLT
• 2018
To the best of the knowledge, this is the first Hessian-free algorithm to find a second-order stationary point faster than GD, and also the first single-loop algorithm with a faster rate than GD even in the setting of finding a first- order stationary point.
Escaping Saddle Points Faster with Stochastic Momentum
• Computer Science
ICLR
• 2020
Stochastic momentum improves deep network training because it modifies SGD to escape saddle points faster and, consequently, to more quickly find a second order stationary point.
How to Escape Saddle Points Efficiently
• Computer Science, Mathematics
ICML
• 2017
This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension, which shows that perturbed gradient descent can escape saddle points almost for free.
Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis
• Computer Science, Mathematics
COLT
• 2017
The present work provides a nonasymptotic analysis in the context of non-convex learning problems, giving finite-time guarantees for SGLD to find approximate minimizers of both empirical and population risks.
Non-Convex Optimization via Non-Reversible Stochastic Gradient Langevin Dynamics
• Computer Science
• 2020
Non reversible Stochastic Gradient Langevin Dynamics (NSGLD) which is based on discretization of the non-reversible Langevin diffusion is studied and finite-time performance bounds for the global convergence of NSGLD for solving stochastic non-convex optimization problems are provided.
• Computer Science
NIPS
• 2017
This paper shows that even with fairly natural random initialization schemes and non-pathological functions, GD can be significantly slowed down by saddle points, taking exponential time to escape.
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
• Computer Science
NIPS
• 2014
This paper proposes a new approach to second-order optimization, the saddle-free Newton method, that can rapidly escape high dimensional saddle points, unlike gradient descent and quasi-Newton methods, and applies this algorithm to deep or recurrent neural network training, and provides numerical evidence for its superior optimization performance.