# Gradient Descent Efficiently Finds the Cubic-Regularized Non-Convex Newton Step

@article{Carmon2016GradientDE, title={Gradient Descent Efficiently Finds the Cubic-Regularized Non-Convex Newton Step}, author={Yair Carmon and John C. Duchi}, journal={ArXiv}, year={2016}, volume={abs/1612.00547} }

We consider the minimization of non-convex quadratic forms regularized by a cubic term, which exhibit multiple saddle points and poor local minima. Nonetheless, we prove that, under mild assumptions, gradient descent approximates the $\textit{global minimum}$ to within $\varepsilon$ accuracy in $O(\varepsilon^{-1}\log(1/\varepsilon))$ steps for large $\varepsilon$ and $O(\log(1/\varepsilon))$ steps for small $\varepsilon$ (compared to a condition number we define), with at most logarithmic… Expand

#### 79 Citations

Randomized Block Cubic Newton Method

- Mathematics, Computer Science
- ICML
- 2018

RBCN is the first algorithm with these properties, generalizing several existing methods, matching the best known bounds in all special cases, and outperforms the state-of-the-art on a variety of machine learning problems, including cubically regularized least-squares, logistic regression with constraints, and Poisson regression. Expand

On Noisy Negative Curvature Descent: Competing with Gradient Descent for Faster Non-convex Optimization

- Mathematics
- 2017

The Hessian-vector product has been utilized to find a second-order stationary solution with strong complexity guarantee (e.g., almost linear time complexity in the problem's dimensionality). In this… Expand

Stochastic Variance-Reduced Cubic Regularized Newton Method

- Computer Science, Mathematics
- ICML
- 2018

This work shows that the proposed stochastic variance-reduced cubic regularized Newton method is guaranteed to converge to an approximately local minimum within $\tilde{O}(n^{4/5}/\epsilon^{3/2})$ second-order oracle calls, which outperforms the state-of-the-art cubic regularization algorithms including subsampled cubic regularizations. Expand

Cubic Regularized ADMM with Convergence to a Local Minimum in Non-convex Optimization

- Computer Science, Mathematics
- 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton)
- 2019

This paper proposes Cubic Regularized Alternating Direction Method of Multipliers to escape saddle points of separable non-convex functions containing a non-Hessian-Lipschitz component and proves that CR-ADMM converges to a local minimum of the original function with a rate of O(1 /T^{1/3})$ in time horizon T, which is faster than gradient-based methods. Expand

On Adaptive Cubic Regularized Newton's Methods for Convex Optimization via Random Sampling

- Mathematics
- 2018

In this paper, we consider an unconstrained optimization model where the objective is a sum of a large number of possibly nonconvex functions, though overall the objective is assumed to be smooth and… Expand

Escaping Saddle-Points Faster under Interpolation-like Conditions

- Computer Science, Mathematics
- NeurIPS
- 2020

In this paper, we show that under over-parametrization several standard stochastic optimization algorithms escape saddle-points and converge to local-minimizers much faster. One of the fundamental… Expand

Stochastic Cubic Regularization for Fast Nonconvex Optimization

- Computer Science, Mathematics
- NeurIPS
- 2018

The proposed algorithm efficiently escapes saddle points and finds approximate local minima for general smooth, nonconvex functions in only $\mathcal{\tilde{O}}(\epsilon^{-3.5})$ stochastic gradient and stochastically Hessian-vector product evaluations. Expand

Efficiently avoiding saddle points with zero order methods: No gradients required

- Computer Science, Mathematics
- NeurIPS
- 2019

This work establishes asymptotic convergence to second order stationary points using a carefully tailored application of the Stable Manifold Theorem to derivative-free algorithms for non-convex optimization that use only function evaluations rather than gradients. Expand

Third-order Smoothness Helps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima

- Computer Science, Mathematics
- NeurIPS
- 2018

We propose stochastic optimization algorithms that can find local minima faster than existing algorithms for nonconvex optimization problems, by exploiting the third-order smoothness to escape… Expand

Sharp Analysis for Nonconvex SGD Escaping from Saddle Points

- Mathematics, Computer Science
- COLT
- 2019

A sharp analysis for Stochastic Gradient Descent is given and it is proved that SGD is able to efficiently escape from saddle points and find an approximate second-order stationary point in $\tilde{O}(\epsilon^{-3.5}))$ stochastic gradient computations for generic nonconvex optimization problems, when the objective function satisfies gradient-Lipschitz, Hessian-Lipitz, and dispersive noise assumptions. Expand

#### References

SHOWING 1-10 OF 29 REFERENCES

Accelerated Methods for Non-Convex Optimization

- Computer Science, Mathematics
- ArXiv
- 2016

The method improves upon the complexity of gradient descent and provides the additional second-order guarantee that $\nabla^2 f(x) \succeq -O(\epsilon^{1/2})I$ for the computed $x$. Expand

Accelerated Methods for NonConvex Optimization

- Computer Science, Mathematics
- SIAM J. Optim.
- 2018

This work presents an accelerated gradient method for nonconvex optimization problems with Lipschitz continuous first and second derivatives that is Hessian free, i.e., it only requires gradient computations, and is therefore suitable for large-scale applications. Expand

Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition

- Mathematics, Computer Science
- COLT
- 2015

This paper identifies strict saddle property for non-convex problem that allows for efficient optimization of orthogonal tensor decomposition, and shows that stochastic gradient descent converges to a local minimum in a polynomial number of iterations. Expand

Finding approximate local minima faster than gradient descent

- Computer Science, Mathematics
- STOC
- 2017

We design a non-convex second-order optimization algorithm that is guaranteed to return an approximate local minimum in time which scales linearly in the underlying dimension and the number of… Expand

Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results

- Mathematics, Computer Science
- Math. Program.
- 2011

An Adaptive Regularisation algorithm using Cubics (ARC) is proposed for unconstrained optimization, generalizing at the same time an unpublished method due to Griewank, an algorithm by Nesterov and Polyak and a proposal by Weiser et al. Expand

Finding Approximate Local Minima for Nonconvex Optimization in Linear Time

- Computer Science, Mathematics
- ArXiv
- 2016

A non-convex second-order optimization algorithm that is guaranteed to return an approximate local minimum in time which is linear in the input representation and applies to a general class of optimization problems including training a neural network and other non- Convex objectives arising in machine learning. Expand

Complexity bounds for second-order optimality in unconstrained optimization

- Computer Science, Mathematics
- J. Complex.
- 2012

It is shown that a comparison of the bounds on the worst-case behaviour of the cubic regularization and trust-region algorithms favours the first of these methods. Expand

A linear-time algorithm for trust region problems

- Mathematics, Computer Science
- Math. Program.
- 2016

This work gives the first provable linear-time (in the number of non-zero entries of the input) algorithm for approximately solving the fundamental problem of minimizing a general quadratic function over an ellipsoidal domain. Expand

A Second-Order Cone Based Approach for Solving the Trust-Region Subproblem and Its Variants

- Mathematics, Computer Science
- SIAM J. Optim.
- 2017

This study highlights an explicit connection between the classical nonconvex TRS and smooth convex quadratic minimization, which allows for the application of cheap iterative methods such as Nesterov's accelerated gradient descent, to the TRS. Expand

On the use of iterative methods in cubic regularization for unconstrained optimization

- Mathematics, Computer Science
- Comput. Optim. Appl.
- 2015

This paper introduces a new stopping criterion in order to properly manage the “over-solving” issue arising whenever the cubic model is not an adequate model of the true objective function. Expand