# Sub-sampled Cubic Regularization for Non-convex Optimization

@inproceedings{Kohler2017SubsampledCR,
title={Sub-sampled Cubic Regularization for Non-convex Optimization},
author={Jonas Moritz Kohler and Aur{\'e}lien Lucchi},
booktitle={ICML},
year={2017}
}
• Published in ICML 2017
• Computer Science, Mathematics
We consider the minimization of non-convex functions that typically arise in machine learning. Specifically, we focus our attention on a variant of trust region methods known as cubic regularization. This approach is particularly attractive because it escapes strict saddle points and it provides stronger convergence guarantees than first- and second-order as well as classical trust region methods. However, it suffers from a high computational complexity that makes it impractical for large-scale… Expand
82 Citations
On Adaptive Cubic Regularized Newton's Methods for Convex Optimization via Random Sampling
• Mathematics
• 2018
In this paper, we consider an unconstrained optimization model where the objective is a sum of a large number of possibly nonconvex functions, though overall the objective is assumed to be smooth andExpand
Newton-type methods for non-convex optimization under inexact Hessian information
• Computer Science, Mathematics
• Math. Program.
• 2020
The canonical problem of finite-sum minimization is considered, and appropriate uniform and non-uniform sub-sampling strategies are provided to construct such Hessian approximations, and optimal iteration complexity is obtained for the correspondingSub-sampled trust-region and adaptive cubic regularization methods. Expand
Adaptively Accelerating Cubic Regularized Newton's Methods for Convex Optimization via Random Sampling
• Mathematics
• 2018
In this paper, we consider an unconstrained optimization model where the objective is a sum of a large number of possibly nonconvex functions, though overall the objective is assumed to be smooth andExpand
Combining Stochastic Adaptive Cubic Regularization with Negative Curvature for Nonconvex Optimization
• Computer Science, Mathematics
• J. Optim. Theory Appl.
• 2020
This is the first approach that combines the negative curvature method with the adaptive cubic-regularized Newton method, and makes the SANC algorithm more practical to apply for solving large-scale machine learning problems. Expand
Stochastic Second-order Methods for Non-convex Optimization with Inexact Hessian and Gradient
• Mathematics, Computer Science
• ArXiv
• 2018
This paper studies a family of stochastic trust region and cubic regularization methods when gradient, Hessian and function values are computed inexactly, and shows the iteration complexity to achieve $\epsilon$-approximate second-order optimality is in the same order with previous work for which gradient and functionvalues are computed exactly. Expand
Stochastic Variance-Reduced Cubic Regularization Methods
• Computer Science, Mathematics
• J. Mach. Learn. Res.
• 2019
A stochastic variance-reduced cubic regularized Newton method (SVRC) for non-convex optimization, which is guaranteed to converge to an-approximate local minimum within Õ(n/ ) second-order oracle calls, which outperforms the state-of-the-art cubic regularization algorithms including subsampled cubicRegularization. Expand
Second-Order Optimization for Non-Convex Machine Learning: An Empirical Study
• Computer Science, Mathematics
• SDM
• 2020
Detailed empirical evaluations of a class of Newton-type methods, namely sub-sampled variants of trust region (TR) and adaptive regularization with cubics (ARC) algorithms, for non-convex ML problems demonstrate that these methods not only can be computationally competitive with hand-tuned SGD with momentum, obtaining comparable or better generalization performance, but also they are highly robust to hyper-parameter settings. Expand
Inexact Proximal Cubic Regularized Newton Methods for Convex Optimization
• Mathematics
• 2019
In this paper, we use Proximal Cubic regularized Newton Methods (PCNM) to optimize the sum of a smooth convex function and a non-smooth convex function, where we use inexact gradient and Hessian, andExpand
Inexact Nonconvex Newton-Type Methods
• Mathematics, Computer Science
• 2018
This work proposes inexact variants of trust region and adaptive cubic regularization methods, which, to increase efficiency, incorporate various approximations, and explores randomized sub-sampling methods as ways to construct the gradient and Hessian approximation. Expand
Cubic Regularization with Momentum for Nonconvex Optimization
• Zhe Wang
• Computer Science, Mathematics
• UAI
• 2019
Theoretically, it is proved that CR under momentum achieves the best possible convergence rate to a second-order stationary point for nonconvex optimization, and the proposed algorithm can allow computational inexactness that reduces the overall sample complexity without degrading the convergence rate. Expand

#### References

SHOWING 1-10 OF 34 REFERENCES
• Mathematics, Computer Science
• COLT
• 2015
This paper identifies strict saddle property for non-convex problem that allows for efficient optimization of orthogonal tensor decomposition, and shows that stochastic gradient descent converges to a local minimum in a polynomial number of iterations. Expand
Convergence rates of sub-sampled Newton methods
• Computer Science, Mathematics
• NIPS
• 2015
This paper uses sub-sampling techniques together with low-rank approximation to design a new randomized batch algorithm which possesses comparable convergence rate to Newton's method, yet has much smaller per-iteration cost. Expand
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
• Computer Science, Mathematics
• NIPS
• 2014
This paper proposes a new approach to second-order optimization, the saddle-free Newton method, that can rapidly escape high dimensional saddle points, unlike gradient descent and quasi-Newton methods, and applies this algorithm to deep or recurrent neural network training, and provides numerical evidence for its superior optimization performance. Expand
Convergence Rate Analysis of a Stochastic Trust Region Method for Nonconvex Optimization
• Mathematics
• 2016
We introduce a variant of a traditional trust region method which is aimed at stochastic optimization. While traditional trust region method relies on exact computations of the gradient and values ofExpand
Finding Local Minima for Nonconvex Optimization in Linear Time
• Mathematics
• 2016
We design a non-convex second-order optimization algorithm that is guaranteed to return an approximate local minimum in time which is linear in the input representation. The previously fastestExpand
Gradient Descent Efficiently Finds the Cubic-Regularized Non-Convex Newton Step
• Mathematics, Computer Science
• ArXiv
• 2016
We consider the minimization of non-convex quadratic forms regularized by a cubic term, which exhibit multiple saddle points and poor local minima. Nonetheless, we prove that, under mild assumptions,Expand
Fast Incremental Method for Nonconvex Optimization
• Mathematics, Computer Science
• ArXiv
• 2016
This paper analyzes the SAGA algorithm within an Incremental First-order Oracle framework, and shows that it converges to a stationary point provably faster than both gradient descent and stochastic gradient descent. Expand
Global convergence rate analysis of unconstrained optimization methods based on probabilistic models
• Mathematics, Computer Science
• Math. Program.
• 2018
It is shown that in terms of the order of the accuracy, the evaluation complexity of a line-search method which is based on random first-order models and directions is the same as its counterparts that use deterministic accurate models; the use of probabilistic models only increases the complexity by a constant, which depends on the probability of the models being good. Expand
Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results
• Mathematics, Computer Science
• Math. Program.
• 2011
An Adaptive Regularisation algorithm using Cubics (ARC) is proposed for unconstrained optimization, generalizing at the same time an unpublished method due to Griewank, an algorithm by Nesterov and Polyak and a proposal by Weiser et al. Expand
Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming
• Mathematics, Computer Science
• SIAM J. Optim.
• 2013
This paper discusses a variant of the algorithm which consists of applying a post-optimization phase to evaluate a short list of solutions generated by several independent runs of the RSG method, and shows that such modification allows to improve significantly the large-deviation properties of the algorithms. Expand