• Corpus ID: 88522834

On Noisy Negative Curvature Descent: Competing with Gradient Descent for Faster Non-convex Optimization

@article{Liu2017OnNN,
  title={On Noisy Negative Curvature Descent: Competing with Gradient Descent for Faster Non-convex Optimization},
  author={Mingrui Liu and Tianbao Yang},
  journal={arXiv: Optimization and Control},
  year={2017}
}
The Hessian-vector product has been utilized to find a second-order stationary solution with strong complexity guarantee (e.g., almost linear time complexity in the problem's dimensionality). In this paper, we propose to further reduce the number of Hessian-vector products for faster non-convex optimization. Previous algorithms need to approximate the smallest eigen-value with a sufficient precision (e.g., $\epsilon_2\ll 1$) in order to achieve a sufficiently accurate second-order stationary… 

Tables from this paper

NEON+: Accelerated Gradient Methods for Extracting Negative Curvature for Non-Convex Optimization
TLDR
By leveraging the proposed AG methods for extracting the negative curvature, this work presents a new AG algorithm with double loops for non-convex optimization, improving that of gradient descent method by a factor of $\epsilon^{-0.25}$ and matching the best iteration complexity of second-order Hessian-free methods for non -conveX optimization.
A Deterministic Gradient-Based Approach to Avoid Saddle Points.
TLDR
This paper proposes a modification of the recently proposed Laplacian smoothing gradient descent (mLSGD), and demonstrates its potential to avoid saddle points without sacrificing the convergence rate.
Sample Complexity of Stochastic Variance-Reduced Cubic Regularization for Nonconvex Optimization
TLDR
A stochastic variance-reduced cubic-regularized (SVRC) Newton's method under both sampling with and without replacement schemes, which improves that of CR as well as other sub-sampling variant methods via the variance reduction scheme.
Convergence of Cubic Regularization for Nonconvex Optimization under KL Property
TLDR
The asymptotic convergence rate of CR is explored by exploiting the ubiquitous Kurdyka-Lojasiewicz (KL) property of the nonconvex objective functions including function value gap, variable distance gap, gradient norm and least eigenvalue of the Hessian matrix.
Cubic Regularization with Momentum for Nonconvex Optimization
TLDR
Theoretically, it is proved that CR under momentum achieves the best possible convergence rate to a second-order stationary point for nonconvex optimization, and the proposed algorithm can allow computational inexactness that reduces the overall sample complexity without degrading the convergence rate.
Escaping Saddle Points in Nonconvex Minimax Optimization via Cubic-Regularized Gradient Descent-Ascent
TLDR
CCubic-GDA is developed – the first GDA-type algorithm for escaping strict saddle points in nonconvex-stronglyconcave minimax optimization and achieves an orderwise faster convergence rate than the standard GDA for a wide spectrum of gradient dominant geometry.
On the Second-order Convergence Properties of Random Search Methods
TLDR
A novel variant of random search that exploits negative curvature by only relying on function evaluations is proposed, and it is proved that this approach converges to a second-order stationary point at a much faster rate than vanilla methods.
A Subsampling Line-Search Method with Second-Order Results.
TLDR
A stochastic algorithm based on negative curvature and Newton-type directions that are computed for a subsampling model of the objective is described, which encompasses the deterministic regime, and allows us to identify sampling requirements for second-order line-search paradigms.
First-order Stochastic Algorithms for Escaping From Saddle Points in Almost Linear Time
TLDR
A novel perspective of noise-adding technique, i.e., adding the noise into the first-order information can help extract the negative curvature from the Hessian matrix is presented, and a formal reasoning of this perspective is provided by analyzing a simple first- order procedure.
Exploiting negative curvature in deterministic and stochastic optimization
TLDR
New frameworks for combining descent and negative curvature directions are presented: alternating two-step approaches and dynamic step approaches that make algorithmic decisions based on (estimated) upper-bounding models of the objective function.
...
...

References

SHOWING 1-10 OF 29 REFERENCES
Gradient Descent Efficiently Finds the Cubic-Regularized Non-Convex Newton Step
We consider the minimization of non-convex quadratic forms regularized by a cubic term, which exhibit multiple saddle points and poor local minima. Nonetheless, we prove that, under mild assumptions,
Accelerated Methods for Non-Convex Optimization
TLDR
The method improves upon the complexity of gradient descent and provides the additional second-order guarantee that $\nabla^2 f(x) \succeq -O(\epsilon^{1/2})I$ for the computed $x$.
How to Escape Saddle Points Efficiently
TLDR
This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension, which shows that perturbed gradient descent can escape saddle points almost for free.
Sub-sampled Cubic Regularization for Non-convex Optimization
TLDR
This work provides a sampling scheme that gives sufficiently accurate gradient and Hessian approximations to retain the strong global and local convergence guarantees of cubically regularized methods, and is the first work that gives global convergence guarantees for a sub-sampled variant of cubic regularization on non-convex functions.
Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition
TLDR
This paper identifies strict saddle property for non-convex problem that allows for efficient optimization of orthogonal tensor decomposition, and shows that stochastic gradient descent converges to a local minimum in a polynomial number of iterations.
Finding approximate local minima faster than gradient descent
We design a non-convex second-order optimization algorithm that is guaranteed to return an approximate local minimum in time which scales linearly in the underlying dimension and the number of
Gradient methods for minimizing composite objective function
In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum of two convex terms: one is smooth and given by a black-box oracle, and
Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization
TLDR
A randomized stochastic projected gradient (RSPG) algorithm, in which proper mini-batch of samples are taken at each iteration depending on the total budget of Stochastic samples allowed, is proposed, which shows nearly optimal complexity of the algorithm for convex stoChastic programming.
Complexity Analysis of Second-Order Line-Search Algorithms for Smooth Nonconvex Optimization
TLDR
This paper presents an algorithm with favorable complexity properties that differs in two significant ways from other recently proposed methods, based on line searches only: Each step involves computation of a search direction, followed by a backtracking line search along that direction.
Even Faster SVD Decomposition Yet Without Agonizing Pain
TLDR
A new framework for SVD is put forward and the first accelerated AND stochastic method outperforming outperforms [2] in the running-time regime and outperform in certain parameter regimes without even using alternating minimization.
...
...