# Quasi-Newton Methods for Machine Learning: Forget the Past, Just Sample

@inproceedings{Berahas2019QuasiNewtonMF, title={Quasi-Newton Methods for Machine Learning: Forget the Past, Just Sample}, author={A. Berahas and Majid Jahani and Peter Richt{\'a}rik and Martin Tak'avc}, year={2019} }

We present two sampled quasi-Newton methods (sampled LBFGS and sampled LSR1) for solving empirical risk minimization problems that arise in machine learning. Contrary to the classical variants of these methods that sequentially build Hessian or inverse Hessian approximations as the optimization progresses, our proposed methods sample points randomly around the current iterate at every iteration to produce these approximations. As a result, the approximations constructed make use of more… Expand

#### Figures and Tables from this paper

#### References

SHOWING 1-10 OF 65 REFERENCES

A robust multi-batch L-BFGS method for machine learning*

- Mathematics, Computer Science
- Optim. Methods Softw.
- 2020

This paper shows how to perform stable quasi-Newton updating in the multi-batch setting, studies the convergence properties for both convex and non-convex functions, and illustrates the behaviour of the algorithm in a distributed computing platform on binary classification logistic regression and neural network training problems that arise in machine learning. Expand

Second-Order Optimization for Non-Convex Machine Learning: An Empirical Study

- Computer Science, Mathematics
- SDM
- 2020

Detailed empirical evaluations of a class of Newton-type methods, namely sub-sampled variants of trust region (TR) and adaptive regularization with cubics (ARC) algorithms, for non-convex ML problems demonstrate that these methods not only can be computationally competitive with hand-tuned SGD with momentum, obtaining comparable or better generalization performance, but also they are highly robust to hyper-parameter settings. Expand

A Multi-Batch L-BFGS Method for Machine Learning

- Computer Science, Mathematics
- NIPS
- 2016

This paper shows how to perform stable quasi-Newton updating in the multi-batch setting, illustrates the behavior of the algorithm in a distributed computing platform, and studies its convergence properties for both the convex and nonconvex cases. Expand

A Stochastic Quasi-Newton Method for Large-Scale Optimization

- Mathematics, Computer Science
- SIAM J. Optim.
- 2016

A stochastic quasi-Newton method that is efficient, robust and scalable, and employs the classical BFGS update formula in its limited memory form, based on the observation that it is beneficial to collect curvature information pointwise, and at regular intervals, through (sub-sampled) Hessian-vector products. Expand

Efficient Distributed Hessian Free Algorithm for Large-scale Empirical Risk Minimization via Accumulating Sample Strategy

- Computer Science, Mathematics
- AISTATS
- 2020

The proposed DANCE method is multistage in which the solution of a stage serves as a warm start for the next stage which contains more samples and reduces the number of passes over data to achieve the statistical accuracy of the full training set. Expand

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

- Computer Science, Mathematics
- J. Mach. Learn. Res.
- 2011

This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight. Expand

On the Use of Stochastic Hessian Information in Optimization Methods for Machine Learning

- Mathematics, Computer Science
- SIAM J. Optim.
- 2011

Curvature information is incorporated in two subsampled Hessian algorithms, one based on a matrix-free inexact Newton iteration and one on a preconditioned limited memory BFGS iteration. Expand

adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs

- Computer Science, Mathematics
- ECML/PKDD
- 2016

adaQN is presented, a stochastic quasi-Newton algorithm for training RNNs that retains a low per-iteration cost while allowing for non-diagonal scaling through a Stochastic L-BFGS updating scheme and is judicious in storing and retaining L- BFGS curvature pairs. Expand

Train faster, generalize better: Stability of stochastic gradient descent

- Computer Science, Mathematics
- ICML
- 2016

We show that parametric models trained by a stochastic gradient method (SGM) with few iterations have vanishing generalization error. We prove our results by arguing that SGM is algorithmically… Expand

A Stochastic Quasi-Newton Method for Online Convex Optimization

- Mathematics, Computer Science
- AISTATS
- 2007

Stochastic variants of the wellknown BFGS quasi-Newton optimization method, in both full and memory-limited (LBFGS) forms, are developed for online optimization of convex functions, which asymptotically outperforms previous stochastic gradient methods for parameter estimation in conditional random fields. Expand