Corpus ID: 6891595

Gradient Descent Efficiently Finds the Cubic-Regularized Non-Convex Newton Step

@article{Carmon2016GradientDE,
  title={Gradient Descent Efficiently Finds the Cubic-Regularized Non-Convex Newton Step},
  author={Yair Carmon and John C. Duchi},
  journal={ArXiv},
  year={2016},
  volume={abs/1612.00547}
}
We consider the minimization of non-convex quadratic forms regularized by a cubic term, which exhibit multiple saddle points and poor local minima. Nonetheless, we prove that, under mild assumptions, gradient descent approximates the $\textit{global minimum}$ to within $\varepsilon$ accuracy in $O(\varepsilon^{-1}\log(1/\varepsilon))$ steps for large $\varepsilon$ and $O(\log(1/\varepsilon))$ steps for small $\varepsilon$ (compared to a condition number we define), with at most logarithmic… Expand
Randomized Block Cubic Newton Method
TLDR
RBCN is the first algorithm with these properties, generalizing several existing methods, matching the best known bounds in all special cases, and outperforms the state-of-the-art on a variety of machine learning problems, including cubically regularized least-squares, logistic regression with constraints, and Poisson regression. Expand
On Noisy Negative Curvature Descent: Competing with Gradient Descent for Faster Non-convex Optimization
The Hessian-vector product has been utilized to find a second-order stationary solution with strong complexity guarantee (e.g., almost linear time complexity in the problem's dimensionality). In thisExpand
Stochastic Variance-Reduced Cubic Regularized Newton Method
TLDR
This work shows that the proposed stochastic variance-reduced cubic regularized Newton method is guaranteed to converge to an approximately local minimum within $\tilde{O}(n^{4/5}/\epsilon^{3/2})$ second-order oracle calls, which outperforms the state-of-the-art cubic regularization algorithms including subsampled cubic regularizations. Expand
Cubic Regularized ADMM with Convergence to a Local Minimum in Non-convex Optimization
  • Zai Shi, A. Eryilmaz
  • Computer Science, Mathematics
  • 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton)
  • 2019
TLDR
This paper proposes Cubic Regularized Alternating Direction Method of Multipliers to escape saddle points of separable non-convex functions containing a non-Hessian-Lipschitz component and proves that CR-ADMM converges to a local minimum of the original function with a rate of O(1 /T^{1/3})$ in time horizon T, which is faster than gradient-based methods. Expand
On Adaptive Cubic Regularized Newton's Methods for Convex Optimization via Random Sampling
In this paper, we consider an unconstrained optimization model where the objective is a sum of a large number of possibly nonconvex functions, though overall the objective is assumed to be smooth andExpand
Escaping Saddle-Points Faster under Interpolation-like Conditions
In this paper, we show that under over-parametrization several standard stochastic optimization algorithms escape saddle-points and converge to local-minimizers much faster. One of the fundamentalExpand
Stochastic Cubic Regularization for Fast Nonconvex Optimization
TLDR
The proposed algorithm efficiently escapes saddle points and finds approximate local minima for general smooth, nonconvex functions in only $\mathcal{\tilde{O}}(\epsilon^{-3.5})$ stochastic gradient and stochastically Hessian-vector product evaluations. Expand
Efficiently avoiding saddle points with zero order methods: No gradients required
TLDR
This work establishes asymptotic convergence to second order stationary points using a carefully tailored application of the Stable Manifold Theorem to derivative-free algorithms for non-convex optimization that use only function evaluations rather than gradients. Expand
Third-order Smoothness Helps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima
We propose stochastic optimization algorithms that can find local minima faster than existing algorithms for nonconvex optimization problems, by exploiting the third-order smoothness to escapeExpand
Sharp Analysis for Nonconvex SGD Escaping from Saddle Points
TLDR
A sharp analysis for Stochastic Gradient Descent is given and it is proved that SGD is able to efficiently escape from saddle points and find an approximate second-order stationary point in $\tilde{O}(\epsilon^{-3.5}))$ stochastic gradient computations for generic nonconvex optimization problems, when the objective function satisfies gradient-Lipschitz, Hessian-Lipitz, and dispersive noise assumptions. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 29 REFERENCES
Accelerated Methods for Non-Convex Optimization
TLDR
The method improves upon the complexity of gradient descent and provides the additional second-order guarantee that $\nabla^2 f(x) \succeq -O(\epsilon^{1/2})I$ for the computed $x$. Expand
Accelerated Methods for NonConvex Optimization
TLDR
This work presents an accelerated gradient method for nonconvex optimization problems with Lipschitz continuous first and second derivatives that is Hessian free, i.e., it only requires gradient computations, and is therefore suitable for large-scale applications. Expand
Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition
TLDR
This paper identifies strict saddle property for non-convex problem that allows for efficient optimization of orthogonal tensor decomposition, and shows that stochastic gradient descent converges to a local minimum in a polynomial number of iterations. Expand
Finding approximate local minima faster than gradient descent
We design a non-convex second-order optimization algorithm that is guaranteed to return an approximate local minimum in time which scales linearly in the underlying dimension and the number ofExpand
Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results
TLDR
An Adaptive Regularisation algorithm using Cubics (ARC) is proposed for unconstrained optimization, generalizing at the same time an unpublished method due to Griewank, an algorithm by Nesterov and Polyak and a proposal by Weiser et al. Expand
Finding Approximate Local Minima for Nonconvex Optimization in Linear Time
TLDR
A non-convex second-order optimization algorithm that is guaranteed to return an approximate local minimum in time which is linear in the input representation and applies to a general class of optimization problems including training a neural network and other non- Convex objectives arising in machine learning. Expand
Complexity bounds for second-order optimality in unconstrained optimization
TLDR
It is shown that a comparison of the bounds on the worst-case behaviour of the cubic regularization and trust-region algorithms favours the first of these methods. Expand
A linear-time algorithm for trust region problems
TLDR
This work gives the first provable linear-time (in the number of non-zero entries of the input) algorithm for approximately solving the fundamental problem of minimizing a general quadratic function over an ellipsoidal domain. Expand
A Second-Order Cone Based Approach for Solving the Trust-Region Subproblem and Its Variants
TLDR
This study highlights an explicit connection between the classical nonconvex TRS and smooth convex quadratic minimization, which allows for the application of cheap iterative methods such as Nesterov's accelerated gradient descent, to the TRS. Expand
On the use of iterative methods in cubic regularization for unconstrained optimization
TLDR
This paper introduces a new stopping criterion in order to properly manage the “over-solving” issue arising whenever the cubic model is not an adequate model of the true objective function. Expand
...
1
2
3
...