• Corpus ID: 219177461

Exit Time Analysis for Approximations of Gradient Descent Trajectories Around Saddle Points

@article{Dixit2020ExitTA,
  title={Exit Time Analysis for Approximations of Gradient Descent Trajectories Around Saddle Points},
  author={Rishabh Dixit and Waheed Uz Zaman Bajwa},
  journal={ArXiv},
  year={2020},
  volume={abs/2006.01106}
}
This paper considers the problem of understanding the exit time for trajectories of gradient-related first-order methods from saddle neighborhoods under some initial boundary conditions. Given the `flat' geometry around saddle points, first-order methods can struggle in escaping these regions in a fast manner due to the small magnitudes of gradients encountered. In particular, while it is known that gradient-related first-order methods escape strict-saddle neighborhoods, existing literature… 

Figures and Tables from this paper

Boundary Conditions for Linear Exit Time Gradient Trajectories Around Saddle Points: Analysis and Algorithm
TLDR
A simple variant of the vanilla gradient descent algorithm is proposed, termed Curvature Conditioned Regularized Gradient Descent (CCRGD) algorithm, which utilizes a check for an initial boundary condition to ensure its trajectories can escape strict-saddle neighborhoods in linear time.
Linear Regularizers Enforce the Strict Saddle Property
TLDR
It is demonstrated that regularizing a function with a linear term enforces the strict saddle property, and a selection rule is shown to guarantee that gradient descent will escape the neighborhoods around a broad class of non-strict saddle points.
A Linearly Convergent Algorithm for Distributed Principal Component Analysis

References

SHOWING 1-10 OF 43 REFERENCES
Boundary Conditions for Linear Exit Time Gradient Trajectories Around Saddle Points: Analysis and Algorithm
TLDR
A simple variant of the vanilla gradient descent algorithm is proposed, termed Curvature Conditioned Regularized Gradient Descent (CCRGD) algorithm, which utilizes a check for an initial boundary condition to ensure its trajectories can escape strict-saddle neighborhoods in linear time.
How to Escape Saddle Points Efficiently
TLDR
This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension, which shows that perturbed gradient descent can escape saddle points almost for free.
Behavior of accelerated gradient methods near critical points of nonconvex functions
TLDR
It is shown by means of the stable manifold theorem that the heavy-ball method is unlikely to converge to strict saddle points, which are points at which the gradient of the objective is zero but the Hessian has at least one negative eigenvalue.
Revisiting Normalized Gradient Descent: Fast Evasion of Saddle Points
TLDR
A global convergence-time bound is established for NGD under mild assumptions because it is shown that NGD “almost never” converges to saddle points and the time required to escape from a ball of radius is small.
A Generic Approach for Escaping Saddle points
TLDR
A generic framework is introduced that minimizes Hessian based computations while at the same time provably converging to second-order critical points, and yields convergence results competitive to the state-of-the-art.
Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent
TLDR
To the best of the knowledge, this is the first Hessian-free algorithm to find a second-order stationary point faster than GD, and also the first single-loop algorithm with a faster rate than GD even in the setting of finding a first- order stationary point.
First-order Stochastic Algorithms for Escaping From Saddle Points in Almost Linear Time
TLDR
A novel perspective of noise-adding technique, i.e., adding the noise into the first-order information can help extract the negative curvature from the Hessian matrix is presented, and a formal reasoning of this perspective is provided by analyzing a simple first- order procedure.
Efficient approaches for escaping higher order saddle points in non-convex optimization
TLDR
This paper designs the first efficient algorithm guaranteed to converge to a third order local optimum (while existing techniques are at most second order), and shows that it is NP-hard to extend this further to finding fourth order local optima.
Gradient Descent Can Take Exponential Time to Escape Saddle Points
TLDR
This paper shows that even with fairly natural random initialization schemes and non-pathological functions, GD can be significantly slowed down by saddle points, taking exponential time to escape.
First-order Methods Almost Always Avoid Saddle Points
TLDR
It is established that first-order methods avoid saddle points for almost all initializations, and neither access to second-order derivative information nor randomness beyond initialization is necessary to provably avoiding saddle points.
...
...