Newton-type methods for non-convex optimization under inexact Hessian information

  title={Newton-type methods for non-convex optimization under inexact Hessian information},
  author={Peng Xu and Farbod Roosta-Khorasani and Michael W. Mahoney},
  journal={Mathematical Programming},
We consider variants of trust-region and adaptive cubic regularization methods for non-convex optimization, in which the Hessian matrix is approximated. Under certain condition on the inexact Hessian, and using approximate solution of the corresponding sub-problems, we provide iteration complexity to achieve $$\varepsilon $$ε-approximate second-order optimality which have been shown to be tight. Our Hessian approximation condition offers a range of advantages as compared with the prior works… 

On Adaptive Cubic Regularized Newton's Methods for Convex Optimization via Random Sampling

This paper proposes to compute an approximated Hessian matrix by either uniformly or non-uniformly sub-sampling the components of the objective of an unconstrained optimization model and develops both standard and accelerated adaptive cubic regularization approaches and provides theoretical guarantees on global iteration complexity.

Inexact restoration with subsampled trust-region methods for finite-sum minimization

This work proposes a new trust-region method which employs suitable approximations of the objective function, gradient and Hessian built via random subsampling techniques and shows that the new procedure is more efficient, in terms of overall computational cost, than the standard trust- region scheme with subsampled Hessians.

Convergence of Newton-MR under Inexact Hessian Information

This work draws from matrix perturbation theory to estimate the distance between the subspaces underlying the exact and approximate Hessian matrices in Newton-MR, which extends the application range of the classical Newton-CG beyond convexity to invex problems.

Stochastic Second-order Methods for Non-convex Optimization with Inexact Hessian and Gradient

This paper studies a family of stochastic trust region and cubic regularization methods when gradient, Hessian and function values are computed inexactly, and shows the iteration complexity to achieve $\epsilon$-approximate second-order optimality is in the same order with previous work for which gradient and functionvalues are computed exactly.

Adaptively Accelerating Cubic Regularized Newton's Methods for Convex Optimization via Random Sampling

This paper proposes to compute an approximated Hessian matrix by either uniform or non-uniformly sub-sampling the components of the objective, and develops accelerated adaptive cubic regularization approaches that provide theoretical guarantees on global iteration complexity of O(\epsilon^{-1/3}) with high probability.

Accelerating Adaptive Cubic Regularization of Newton's Method via Random Sampling

This paper proposes to compute an approximated Hessian matrix by either uniformly or non-uniformly sub-sampling the components of the objective and develops accelerated adaptive cubic regularization approaches, which provide theoretical guarantees on global iteration complexity of O (cid:15) − 1 / 3 ) with high probability.

Inexact Newton-CG algorithms with complexity guarantees

This approach is a first attempt to introduce inexact Hessian and/or gradient information into the Newton-CG algorithm of Royer & Wright, and derives iteration complexity bounds for achieving $\epsilon $-approximate second-order optimality that match best-known lower bounds.

First-Order Methods for Nonconvex Quadratic Minimization

When the authors use Krylov subspace solutions to approximate the cubic-regularized Newton step, the results recover the strongest known convergence guarantees to approximate second-order stationary points of general smooth nonconvex functions.

Stochastic analysis of an adaptive cubic regularization method under inexact gradient evaluations and dynamic Hessian accuracy

An extended version of the adaptive cubic regularization method with dynamic inexact Hessian information for nonconvex optimization inherits the innovative use of adaptive accuracy requirements for Hessian approximations introduced in the just quoted paper and additionally employs inexact computations of the gradient.

Cubic Regularization Methods with Second-Order Complexity Guarantee Based on a New Subproblem Reformulation

A new reformulation of the cubic regularization subproblem is proposed, an unconstrained convex problem that requires computing the minimum eigenvalue of the Hessian and a variant of adaptive regularization with cubics (ARC) is derived.



Sub-sampled Cubic Regularization for Non-convex Optimization

This work provides a sampling scheme that gives sufficiently accurate gradient and Hessian approximations to retain the strong global and local convergence guarantees of cubically regularized methods, and is the first work that gives global convergence guarantees for a sub-sampled variant of cubic regularization on non-convex functions.

Optimal Newton-type methods for nonconvex smooth optimization problems

A general class of second-order iterations for unconstrained optimization that includes regularization and trust-region variants of Newton’s method is considered, implying cubic regularization has optimal worst-case evaluation complexity within this class ofsecond-order methods.

The Conjugate Gradient Method and Trust Regions in Large Scale Optimization

It is shown in this paper that an approximate solution of the trust region problem may be found by the preconditioned conjugate gradient method, and it is shown that the method has the same convergence properties as existing methods based on the dogleg strategy using an approximate Hessian.

Newton Sketch: A Near Linear-Time Optimization Algorithm with Linear-Quadratic Convergence

A randomized second-order method for optimization known as the Newton Sketch, based on performing an approximate Newton step using a randomly projected or sub-sampled Hessian, is proposed, which has super-linear convergence with exponentially high probability and convergence and complexity guarantees that are independent of condition numbers and related problem-dependent quantities.

On solving trust-region and other regularised subproblems in optimization

Methods that obtain the solution of a sequence of parametrized linear systems by factorization are used, and enhancements using high-order polynomial approximation and inverse iteration ensure that the resulting method is both globally and asymptotically at least superlinearly convergent in all cases.

A recursive ℓ∞-trust-region method for bound-constrained nonlinear optimization

A recursive trust-region method is introduced for the solution of bound-cons-trained nonlinear nonconvex optimization problems for which a hierarchy of descriptions exists, which uses the infinity norm to define the shape of the trust region.

Iterative Methods for Finding a Trust-region Step

This work proposes an extension of the Steihaug-Toint method that allows a solution to be calculated to any prescribed accuracy and includes a parameter that allows the user to take advantage of the tradeoff between the overall number of function evaluations and matrix-vector products associated with the underlying trust-region method.

A Subspace Minimization Method for the Trust-Region Step

A method is proposed that allows the trust-region norm to be defined independently of the preconditioner over a sequence of evolving low-dimensional subspaces and shows that the method can require significantly fewer function evaluations than other methods.

Convergence Rate Analysis of a Stochastic Trust Region Method for Nonconvex Optimization

A variant of a traditional trust region method which is aimed at stochastic optimization and provides a bound on the expected number of iterations the Stochastic algorithm requires to reach accuracy for any $\epsilon>0$.

An inexact regularized Newton framework with a worst-case iteration complexity of O(ε−3/2) for nonconvex optimization

An algorithm for solving smooth nonconvex optimization problems is proposed that, in the worst-case, takes O(ε−3/2) iterations to drive the norm of the gradient of the objective function below a