# Sub-Sampled Newton Methods I: Globally Convergent Algorithms

@article{RoostaKhorasani2016SubSampledNM, title={Sub-Sampled Newton Methods I: Globally Convergent Algorithms}, author={Farbod Roosta-Khorasani and Michael W. Mahoney}, journal={ArXiv}, year={2016}, volume={abs/1601.04737} }

Large scale optimization problems are ubiquitous in machine learning and data analysis and there is a plethora of algorithms for solving such problems. Many of these algorithms employ sub-sampling, as a way to either speed up the computations and/or to implicitly implement a form of statistical regularization. In this paper, we consider second-order iterative optimization algorithms and we provide bounds on the convergence of the variants of Newton's method that incorporate uniform sub-sampling…

## 85 Citations

### Sub-Sampled Newton Methods II: Local Convergence Rates

- Mathematics, Computer ScienceArXiv
- 2016

The analysis here can be used to complement the results of the basic framework from the companion paper, [38], by exploring algorithmic trade-offs that are important in practice.

### On Adaptive Cubic Regularized Newton's Methods for Convex Optimization via Random Sampling

- Computer Science, Mathematics
- 2018

This paper proposes to compute an approximated Hessian matrix by either uniformly or non-uniformly sub-sampling the components of the objective of an unconstrained optimization model and develops both standard and accelerated adaptive cubic regularization approaches and provides theoretical guarantees on global iteration complexity.

### Revisiting Sub-sampled Newton Methods

- Computer Science, MathematicsArXiv
- 2016

This work proposes two new efficient Newton-type methods, Refined Sub-sampled Newton and Refined Sketch Newton, which exhibit a great advantage over existing sub-sampling Newton methods, especially when Hessian-vector multiplication can be calculated efficiently.

### Generalized Hessian approximations via Stein's lemma for constrained minimization

- Computer Science, Mathematics2017 Information Theory and Applications Workshop (ITA)
- 2017

Under certain assumptions, it is shown that the constrained optimization algorithm attains a composite convergence rate that is initially quadratic and asymptotically linear and validate its performance through widely encountered optimization tasks on several real and synthetic datasets by comparing it to classical optimization algorithms.

### Exact and Inexact Subsampled Newton Methods for Optimization

- Computer Science
- 2016

This paper analyzes an inexact Newton method that solves linear systems approximately using the conjugate gradient (CG) method, and that samples the Hessian and not the gradient (the gradient is assumed to be exact).

### Newton-Stein Method: An Optimization Method for GLMs via Stein's Lemma

- Computer Science, MathematicsJ. Mach. Learn. Res.
- 2016

This work proposes an alternative way of constructing the curvature information by formulating it as an estimation problem and applying a Stein-type lemma, which allows further improvements through sub-sampling and eigenvalue thresholding and achieves the highest performance compared to various algorithms on several datasets.

### Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation

- Computer Science, MathematicsAISTATS
- 2020

Stochastic second-order methods for minimizing smooth and strongly-convex functions under an interpolation condition satisfied by over-parameterized models are considered and the regularized subsampled Newton method (R-SSN) achieves global linear convergence with an adaptive step-size and a constant batch-size.

### A Subsampling Line-Search Method with Second-Order Results

- Computer ScienceINFORMS Journal on Optimization
- 2022

This paper describes a stochastic algorithm based on negative curvature and Newton-type directions that are computed for a subsampling model of the objective and presents worst-case complexity guarantees for a notion of stationarity tailored to the subsampled context.

### Adaptive Stochastic Variance Reduction for Subsampled Newton Method with Cubic Regularization

- Mathematics, Computer ScienceINFORMS Journal on Optimization
- 2021

This work presents an adaptive variance reduction scheme for a subsampled Newton method with cubic regularization and shows that the expected Hessian sample complexity is [Formula: see text] for finding an approximate local solution with second order guarantee.

### Low Rank Saddle Free Newton: A Scalable Method for Stochastic Nonconvex Optimization

- Computer Science
- 2020

This work motivates the extension of Newton methods to the SA regime, and argues for the use of the scalable low rank saddle free Newton (LRSFN) method, which avoids forming the Hessian in favor of making a low rank approximation.

## References

SHOWING 1-10 OF 70 REFERENCES

### Sub-Sampled Newton Methods II: Local Convergence Rates

- Mathematics, Computer ScienceArXiv
- 2016

The analysis here can be used to complement the results of the basic framework from the companion paper, [38], by exploring algorithmic trade-offs that are important in practice.

### Convergence rates of sub-sampled Newton methods

- Computer ScienceNIPS
- 2015

This paper uses sub-sampling techniques together with low-rank approximation to design a new randomized batch algorithm which possesses comparable convergence rate to Newton's method, yet has much smaller per-iteration cost.

### Newton Sketch: A Near Linear-Time Optimization Algorithm with Linear-Quadratic Convergence

- Computer Science, MathematicsSIAM J. Optim.
- 2017

A randomized second-order method for optimization known as the Newton Sketch, based on performing an approximate Newton step using a randomly projected or sub-sampled Hessian, is proposed, which has super-linear convergence with exponentially high probability and convergence and complexity guarantees that are independent of condition numbers and related problem-dependent quantities.

### Practical inexact proximal quasi-Newton method with global complexity analysis

- Computer ScienceMath. Program.
- 2016

A general framework is proposed, which includes slightly modified versions of existing algorithms and also a new algorithm, which uses limited memory BFGS Hessian approximations, and provides a novel global convergence rate analysis, which covers methods that solve subproblems via coordinate descent.

### Optimizing Costly Functions with Simple Constraints: A Limited-Memory Projected Quasi-Newton Algorithm

- Computer ScienceAISTATS
- 2009

An optimization algorithm for minimizing a smooth function over a convex set by minimizing a diagonal plus lowrank quadratic approximation to the function, which substantially improves on state-of-the-art methods for problems such as learning the structure of Gaussian graphical models and Markov random elds.

### A Quasi-Newton Approach to Nonsmooth Convex Optimization Problems in Machine Learning

- Computer ScienceJ. Mach. Learn. Res.
- 2010

A new, efficient, exact line search algorithm that is comparable to or better than specialized state-of-the-art solvers on a number of publicly available data sets and proves its worst-case time complexity bounds.

### Updating Quasi-Newton Matrices With Limited Storage

- Computer Science
- 1980

An update formula which generates matrices using information from the last m iterations, where m is any number supplied by the user, and the BFGS method is considered to be the most efficient.

### Sample size selection in optimization methods for machine learning

- Computer ScienceMath. Program.
- 2012

A criterion for increasing the sample size based on variance estimates obtained during the computation of a batch gradient, and establishes an O(1/\epsilon) complexity bound on the total cost of a gradient method.

### Faster least squares approximation

- Computer Science, MathematicsNumerische Mathematik
- 2011

This work presents two randomized algorithms that provide accurate relative-error approximations to the optimal value and the solution vector of a least squares approximation problem more rapidly than existing exact algorithms.

### Minimizing finite sums with the stochastic average gradient

- Computer ScienceMath. Program.
- 2017

Numerical experiments indicate that the new SAG method often dramatically outperforms existing SG and deterministic gradient methods, and that the performance may be further improved through the use of non-uniform sampling strategies.