• Corpus ID: 247762779

Escaping Saddle Points Efficiently with Occupation-Time-Adapted Perturbations

  title={Escaping Saddle Points Efficiently with Occupation-Time-Adapted Perturbations},
  author={Xin Guo and Jiequn Han and Mahan Tajrobehkar and Wenpin Tang},
Motivated by the super-diffusivity of self-repelling random walk, which has roots in statistical physics, this paper develops a new perturbation mechanism for optimization algorithms. In this mechanism, perturbations are adapted to the history of states via the notion of occupation time. After integrating this mechanism into the framework of perturbed gradient descent (PGD) and perturbed accelerated gradient descent (PAGD), two new algorithms are proposed: perturbed gradient descent adapted to… 

Figures and Tables from this paper


Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent
To the best of the knowledge, this is the first Hessian-free algorithm to find a second-order stationary point faster than GD, and also the first single-loop algorithm with a faster rate than GD even in the setting of finding a first- order stationary point.
Escaping Saddle Points Faster with Stochastic Momentum
Stochastic momentum improves deep network training because it modifies SGD to escape saddle points faster and, consequently, to more quickly find a second order stationary point.
How to Escape Saddle Points Efficiently
This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension, which shows that perturbed gradient descent can escape saddle points almost for free.
Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis
The present work provides a nonasymptotic analysis in the context of non-convex learning problems, giving finite-time guarantees for SGLD to find approximate minimizers of both empirical and population risks.
Non-Convex Optimization via Non-Reversible Stochastic Gradient Langevin Dynamics
Non reversible Stochastic Gradient Langevin Dynamics (NSGLD) which is based on discretization of the non-reversible Langevin diffusion is studied and finite-time performance bounds for the global convergence of NSGLD for solving stochastic non-convex optimization problems are provided.
Gradient Descent Can Take Exponential Time to Escape Saddle Points
This paper shows that even with fairly natural random initialization schemes and non-pathological functions, GD can be significantly slowed down by saddle points, taking exponential time to escape.
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
This paper proposes a new approach to second-order optimization, the saddle-free Newton method, that can rapidly escape high dimensional saddle points, unlike gradient descent and quasi-Newton methods, and applies this algorithm to deep or recurrent neural network training, and provides numerical evidence for its superior optimization performance.
On Nonconvex Optimization for Machine Learning: Gradients, Stochasticity, and Saddle Points
Perturbed versions of GD and SGD are analyzed and it is shown that they are truly efficient---their dimension dependence is only polylogarithmic.
Accelerating Nonconvex Learning via Replica Exchange Langevin diffusion
This work theoretically analyze the acceleration effect of replica exchange from two perspectives: the convergence in \chi^2-divergence, and the large deviation principle, and obtains a discrete-time algorithm by discretizing the replica exchange Langevin diffusion.
Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition
This paper identifies strict saddle property for non-convex problem that allows for efficient optimization of orthogonal tensor decomposition, and shows that stochastic gradient descent converges to a local minimum in a polynomial number of iterations.