Sub-sampled Cubic Regularization for Non-convex Optimization

@inproceedings{Kohler2017SubsampledCR,
  title={Sub-sampled Cubic Regularization for Non-convex Optimization},
  author={Jonas Moritz Kohler and Aur{\'e}lien Lucchi},
  booktitle={ICML},
  year={2017}
}
We consider the minimization of non-convex functions that typically arise in machine learning. Specifically, we focus our attention on a variant of trust region methods known as cubic regularization. This approach is particularly attractive because it escapes strict saddle points and it provides stronger convergence guarantees than first- and second-order as well as classical trust region methods. However, it suffers from a high computational complexity that makes it impractical for large-scale… Expand
On Adaptive Cubic Regularized Newton's Methods for Convex Optimization via Random Sampling
In this paper, we consider an unconstrained optimization model where the objective is a sum of a large number of possibly nonconvex functions, though overall the objective is assumed to be smooth andExpand
Newton-type methods for non-convex optimization under inexact Hessian information
TLDR
The canonical problem of finite-sum minimization is considered, and appropriate uniform and non-uniform sub-sampling strategies are provided to construct such Hessian approximations, and optimal iteration complexity is obtained for the correspondingSub-sampled trust-region and adaptive cubic regularization methods. Expand
Adaptively Accelerating Cubic Regularized Newton's Methods for Convex Optimization via Random Sampling
In this paper, we consider an unconstrained optimization model where the objective is a sum of a large number of possibly nonconvex functions, though overall the objective is assumed to be smooth andExpand
Combining Stochastic Adaptive Cubic Regularization with Negative Curvature for Nonconvex Optimization
TLDR
This is the first approach that combines the negative curvature method with the adaptive cubic-regularized Newton method, and makes the SANC algorithm more practical to apply for solving large-scale machine learning problems. Expand
Stochastic Second-order Methods for Non-convex Optimization with Inexact Hessian and Gradient
TLDR
This paper studies a family of stochastic trust region and cubic regularization methods when gradient, Hessian and function values are computed inexactly, and shows the iteration complexity to achieve $\epsilon$-approximate second-order optimality is in the same order with previous work for which gradient and functionvalues are computed exactly. Expand
Stochastic Variance-Reduced Cubic Regularization Methods
TLDR
A stochastic variance-reduced cubic regularized Newton method (SVRC) for non-convex optimization, which is guaranteed to converge to an-approximate local minimum within Õ(n/ ) second-order oracle calls, which outperforms the state-of-the-art cubic regularization algorithms including subsampled cubicRegularization. Expand
Second-Order Optimization for Non-Convex Machine Learning: An Empirical Study
TLDR
Detailed empirical evaluations of a class of Newton-type methods, namely sub-sampled variants of trust region (TR) and adaptive regularization with cubics (ARC) algorithms, for non-convex ML problems demonstrate that these methods not only can be computationally competitive with hand-tuned SGD with momentum, obtaining comparable or better generalization performance, but also they are highly robust to hyper-parameter settings. Expand
Inexact Proximal Cubic Regularized Newton Methods for Convex Optimization
In this paper, we use Proximal Cubic regularized Newton Methods (PCNM) to optimize the sum of a smooth convex function and a non-smooth convex function, where we use inexact gradient and Hessian, andExpand
Inexact Nonconvex Newton-Type Methods
TLDR
This work proposes inexact variants of trust region and adaptive cubic regularization methods, which, to increase efficiency, incorporate various approximations, and explores randomized sub-sampling methods as ways to construct the gradient and Hessian approximation. Expand
Cubic Regularization with Momentum for Nonconvex Optimization
TLDR
Theoretically, it is proved that CR under momentum achieves the best possible convergence rate to a second-order stationary point for nonconvex optimization, and the proposed algorithm can allow computational inexactness that reduces the overall sample complexity without degrading the convergence rate. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 34 REFERENCES
Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition
TLDR
This paper identifies strict saddle property for non-convex problem that allows for efficient optimization of orthogonal tensor decomposition, and shows that stochastic gradient descent converges to a local minimum in a polynomial number of iterations. Expand
Convergence rates of sub-sampled Newton methods
TLDR
This paper uses sub-sampling techniques together with low-rank approximation to design a new randomized batch algorithm which possesses comparable convergence rate to Newton's method, yet has much smaller per-iteration cost. Expand
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
TLDR
This paper proposes a new approach to second-order optimization, the saddle-free Newton method, that can rapidly escape high dimensional saddle points, unlike gradient descent and quasi-Newton methods, and applies this algorithm to deep or recurrent neural network training, and provides numerical evidence for its superior optimization performance. Expand
Convergence Rate Analysis of a Stochastic Trust Region Method for Nonconvex Optimization
We introduce a variant of a traditional trust region method which is aimed at stochastic optimization. While traditional trust region method relies on exact computations of the gradient and values ofExpand
Finding Local Minima for Nonconvex Optimization in Linear Time
We design a non-convex second-order optimization algorithm that is guaranteed to return an approximate local minimum in time which is linear in the input representation. The previously fastestExpand
Gradient Descent Efficiently Finds the Cubic-Regularized Non-Convex Newton Step
We consider the minimization of non-convex quadratic forms regularized by a cubic term, which exhibit multiple saddle points and poor local minima. Nonetheless, we prove that, under mild assumptions,Expand
Fast Incremental Method for Nonconvex Optimization
TLDR
This paper analyzes the SAGA algorithm within an Incremental First-order Oracle framework, and shows that it converges to a stationary point provably faster than both gradient descent and stochastic gradient descent. Expand
Global convergence rate analysis of unconstrained optimization methods based on probabilistic models
TLDR
It is shown that in terms of the order of the accuracy, the evaluation complexity of a line-search method which is based on random first-order models and directions is the same as its counterparts that use deterministic accurate models; the use of probabilistic models only increases the complexity by a constant, which depends on the probability of the models being good. Expand
Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results
TLDR
An Adaptive Regularisation algorithm using Cubics (ARC) is proposed for unconstrained optimization, generalizing at the same time an unpublished method due to Griewank, an algorithm by Nesterov and Polyak and a proposal by Weiser et al. Expand
Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming
TLDR
This paper discusses a variant of the algorithm which consists of applying a post-optimization phase to evaluate a short list of solutions generated by several independent runs of the RSG method, and shows that such modification allows to improve significantly the large-deviation properties of the algorithms. Expand
...
1
2
3
4
...