# On Noisy Negative Curvature Descent: Competing with Gradient Descent for Faster Non-convex Optimization

@article{Liu2017OnNN, title={On Noisy Negative Curvature Descent: Competing with Gradient Descent for Faster Non-convex Optimization}, author={Mingrui Liu and Tianbao Yang}, journal={arXiv: Optimization and Control}, year={2017} }

The Hessian-vector product has been utilized to find a second-order stationary solution with strong complexity guarantee (e.g., almost linear time complexity in the problem's dimensionality). In this paper, we propose to further reduce the number of Hessian-vector products for faster non-convex optimization. Previous algorithms need to approximate the smallest eigen-value with a sufficient precision (e.g., $\epsilon_2\ll 1$) in order to achieve a sufficiently accurate second-order stationary…

## 19 Citations

NEON+: Accelerated Gradient Methods for Extracting Negative Curvature for Non-Convex Optimization

- Computer Science
- 2017

By leveraging the proposed AG methods for extracting the negative curvature, this work presents a new AG algorithm with double loops for non-convex optimization, improving that of gradient descent method by a factor of $\epsilon^{-0.25}$ and matching the best iteration complexity of second-order Hessian-free methods for non -conveX optimization.

Sample Complexity of Stochastic Variance-Reduced Cubic Regularization for Nonconvex Optimization

- Computer Science, MathematicsAISTATS
- 2019

A stochastic variance-reduced cubic-regularized (SVRC) Newton's method under both sampling with and without replacement schemes, which improves that of CR as well as other sub-sampling variant methods via the variance reduction scheme.

Convergence of Cubic Regularization for Nonconvex Optimization under KL Property

- Computer Science, MathematicsNeurIPS
- 2018

The asymptotic convergence rate of CR is explored by exploiting the ubiquitous Kurdyka-Lojasiewicz (KL) property of the nonconvex objective functions including function value gap, variable distance gap, gradient norm and least eigenvalue of the Hessian matrix.

Cubic Regularization with Momentum for Nonconvex Optimization

- Computer ScienceUAI
- 2019

Theoretically, it is proved that CR under momentum achieves the best possible convergence rate to a second-order stationary point for nonconvex optimization, and the proposed algorithm can allow computational inexactness that reduces the overall sample complexity without degrading the convergence rate.

Escaping Saddle Points in Nonconvex Minimax Optimization via Cubic-Regularized Gradient Descent-Ascent

- Computer ScienceArXiv
- 2021

CCubic-GDA is developed – the first GDA-type algorithm for escaping strict saddle points in nonconvex-stronglyconcave minimax optimization and achieves an orderwise faster convergence rate than the standard GDA for a wide spectrum of gradient dominant geometry.

On the Second-order Convergence Properties of Random Search Methods

- Computer Science, MathematicsNeurIPS
- 2021

A novel variant of random search that exploits negative curvature by only relying on function evaluations is proposed, and it is proved that this approach converges to a second-order stationary point at a much faster rate than vanilla methods.

A Subsampling Line-Search Method with Second-Order Results.

- Computer Science
- 2018

A stochastic algorithm based on negative curvature and Newton-type directions that are computed for a subsampling model of the objective is described, which encompasses the deterministic regime, and allows us to identify sampling requirements for second-order line-search paradigms.

First-order Stochastic Algorithms for Escaping From Saddle Points in Almost Linear Time

- Computer ScienceNeurIPS
- 2018

A novel perspective of noise-adding technique, i.e., adding the noise into the first-order information can help extract the negative curvature from the Hessian matrix is presented, and a formal reasoning of this perspective is provided by analyzing a simple first- order procedure.

Exploiting negative curvature in deterministic and stochastic optimization

- Computer ScienceMath. Program.
- 2019

New frameworks for combining descent and negative curvature directions are presented: alternating two-step approaches and dynamic step approaches that make algorithmic decisions based on (estimated) upper-bounding models of the objective function.

Regional complexity analysis of algorithms for nonconvex smooth optimization

- Computer ScienceMath. Program.
- 2021

A strategy is proposed for characterizing the worst-case performance of algorithms for solving nonconvex smooth optimization problems over regions defined by first- and second-order derivatives and for analyzing the behavior of higher-order algorithms.

## References

SHOWING 1-10 OF 29 REFERENCES

Gradient Descent Efficiently Finds the Cubic-Regularized Non-Convex Newton Step

- Mathematics, Computer ScienceArXiv
- 2016

We consider the minimization of non-convex quadratic forms regularized by a cubic term, which exhibit multiple saddle points and poor local minima. Nonetheless, we prove that, under mild assumptions,…

Accelerated Methods for Non-Convex Optimization

- Computer ScienceArXiv
- 2016

The method improves upon the complexity of gradient descent and provides the additional second-order guarantee that $\nabla^2 f(x) \succeq -O(\epsilon^{1/2})I$ for the computed $x$.

How to Escape Saddle Points Efficiently

- Computer Science, MathematicsICML
- 2017

This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension, which shows that perturbed gradient descent can escape saddle points almost for free.

Sub-sampled Cubic Regularization for Non-convex Optimization

- Computer Science, MathematicsICML
- 2017

This work provides a sampling scheme that gives sufficiently accurate gradient and Hessian approximations to retain the strong global and local convergence guarantees of cubically regularized methods, and is the first work that gives global convergence guarantees for a sub-sampled variant of cubic regularization on non-convex functions.

Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition

- Computer Science, MathematicsCOLT
- 2015

This paper identifies strict saddle property for non-convex problem that allows for efficient optimization of orthogonal tensor decomposition, and shows that stochastic gradient descent converges to a local minimum in a polynomial number of iterations.

Finding approximate local minima faster than gradient descent

- Computer ScienceSTOC
- 2017

We design a non-convex second-order optimization algorithm that is guaranteed to return an approximate local minimum in time which scales linearly in the underlying dimension and the number of…

Gradient methods for minimizing composite objective function

- Computer Science, Mathematics
- 2007

In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum of two convex terms: one is smooth and given by a black-box oracle, and…

Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization

- Computer ScienceMath. Program.
- 2016

A randomized stochastic projected gradient (RSPG) algorithm, in which proper mini-batch of samples are taken at each iteration depending on the total budget of Stochastic samples allowed, is proposed, which shows nearly optimal complexity of the algorithm for convex stoChastic programming.

Complexity Analysis of Second-Order Line-Search Algorithms for Smooth Nonconvex Optimization

- Computer ScienceSIAM J. Optim.
- 2018

This paper presents an algorithm with favorable complexity properties that differs in two significant ways from other recently proposed methods, based on line searches only: Each step involves computation of a search direction, followed by a backtracking line search along that direction.

Even Faster SVD Decomposition Yet Without Agonizing Pain

- Computer ScienceNIPS
- 2016

A new framework for SVD is put forward and the first accelerated AND stochastic method outperforming outperforms [2] in the running-time regime and outperform in certain parameter regimes without even using alternating minimization.