Corpus ID: 17905941

Gradient Descent Converges to Minimizers

@article{Lee2016GradientDC,
  title={Gradient Descent Converges to Minimizers},
  author={J. Lee and Max Simchowitz and Michael I. Jordan and Benjamin Recht},
  journal={ArXiv},
  year={2016},
  volume={abs/1602.04915}
}
We show that gradient descent converges to a local minimizer, almost surely with random initialization. This is proved by applying the Stable Manifold Theorem from dynamical systems theory. 
Gradient Descent Converges to Minimizers: The Case of Non-Isolated Critical Points
We prove that the set of initial conditions so that gradient descent converges to strict saddle points has (Lebesgue) measure zero, even for non-isolated critical points, answering an open questionExpand
Block Coordinate Descent Only Converge to Minimizers
Given a non-convex twice continuously differentiable cost function with Lipschitz continuous gradient, we prove that all of block coordinate gradient descent, block mirror descent and proximal blockExpand
Block Coordinate Descent Almost Surely Converges to a Stationary Point Satisfying the Second-order Necessary Condition
Given a non-convex twice continuously differentiable cost function with Lipschitz continuous gradient, we prove that all of the block coordinate gradient descent, block mirror descent and proximalExpand
Gradient Descent Learns Linear Dynamical Systems
We prove that gradient descent efficiently converges to the global optimizer of the maximum likelihood objective of an unknown linear time-invariant dynamical system from a sequence of noisyExpand
Correction: Beyond convexity—Contraction and global convergence of gradient descent
TLDR
This research presents a novel probabilistic procedure that allows for direct measurement of the response of the immune system to earthquake-triggered landsliding. Expand
Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions
TLDR
It is proved that the set of initial conditions so that gradient descent converges to saddle points where f has at least one strictly negative eigenvalue has (Lebesgue) measure zero, even for cost functions f with non-isolated critical points, answering an open question in [12]. Expand
Certain Systems Arising in Stochastic Gradient Descent
CERTAIN SYSTEMS ARISING IN STOCHASTIC GRADIENT DESCENT Konstantinos Karatapanis Robin Pemantle Stochastic approximations is a rich branch of probability theory and has a wide range of application.Expand
Competitive Gradient Descent
We introduce a new algorithm for the numerical computation of Nash equilibria of competitive two-player games. Our method is a natural generalization of gradient descent to the two-player settingExpand
An Improved Adagrad Gradient Descent Optimization Algorithm
TLDR
The results show that the proposed improved Adagrad gradient descent optimization algorithm has a more stable convergence process and can reduce overfitting in multiple epochs. Expand
Asymptotic Escape of Spurious Critical Points on the Low-rank Matrix Manifold
TLDR
It is shown that using the dynamical low-rank approximation and a rescaled gradient flow, some of the spurious critical points can be converted to classical strict saddle points, which leads to the desired result. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 44 REFERENCES
Cubic regularization of Newton method and its global performance
TLDR
This paper provides theoretical analysis for a cubic regularization of Newton method as applied to unconstrained minimization problem and proves general local convergence results for this scheme. Expand
Global Stability of Dynamical Systems
1 Generalities.- 2 Filtrations.- 3 Sequences of Filtrations.- 4 Hyperbolic Sets.- 5 Stable Manifolds.- 6 Stable Manifolds for Hyperbolic Sets.- 7 More Consequences of Hyperbolicity.- 8 Stability.- 9Expand
Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition
TLDR
This paper identifies strict saddle property for non-convex problem that allows for efficient optimization of orthogonal tensor decomposition, and shows that stochastic gradient descent converges to a local minimum in a polynomial number of iterations. Expand
Convergence of the Iterates of Descent Methods for Analytic Cost Functions
TLDR
It is shown that the iterates of numerical descent algorithms, for an analytic cost function, share this convergence property if they satisfy certain natural descent conditions and strengthen classical "weak convergence" results for descent methods to "strong limit-point convergence" for a large class of cost functions of practical interest. Expand
Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods
TLDR
This work proves an abstract convergence result for descent methods satisfying a sufficient-decrease assumption, and allowing a relative error tolerance, that guarantees the convergence of bounded sequences under the assumption that the function f satisfies the Kurdyka–Łojasiewicz inequality. Expand
On the saddle point problem for non-convex optimization
TLDR
It is argued, based on results from statistical physics, random matrix theory, and neural network theory, that a deeper and more profound difficulty originates from the proliferation of saddle points, not local minima, especially in high dimensional problems of practical interest. Expand
Newton-type methods for unconstrained and linearly constrained optimization
TLDR
The methods are intimately based on the recurrence of matrix factorizations and are linked to earlier work on quasi-Newton methods and quadratic programming. Expand
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
TLDR
This paper proposes a new approach to second-order optimization, the saddle-free Newton method, that can rapidly escape high dimensional saddle points, unlike gradient descent and quasi-Newton methods, and applies this algorithm to deep or recurrent neural network training, and provides numerical evidence for its superior optimization performance. Expand
Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Lojasiewicz Inequality
TLDR
A convergent proximal reweighted l1 algorithm for compressive sensing and an application to rank reduction problems is provided, which depends on the geometrical properties of the function L around its critical points. Expand
On the use of directions of negative curvature in a modified newton method
TLDR
A modified Newton method for the unconstrained minimization problem is presented and it is shown how the Bunch and Parlett decomposition of a symmetric indefinite matrix can be used to give entirely adequate directions of negative curvature. Expand
...
1
2
3
4
5
...