Corpus ID: 236469602

Quasi-Newton Methods for Machine Learning: Forget the Past, Just Sample

  title={Quasi-Newton Methods for Machine Learning: Forget the Past, Just Sample},
  author={A. Berahas and Majid Jahani and Peter Richt{\'a}rik and Martin Tak'avc},
We present two sampled quasi-Newton methods (sampled LBFGS and sampled LSR1) for solving empirical risk minimization problems that arise in machine learning. Contrary to the classical variants of these methods that sequentially build Hessian or inverse Hessian approximations as the optimization progresses, our proposed methods sample points randomly around the current iterate at every iteration to produce these approximations. As a result, the approximations constructed make use of more… Expand


A robust multi-batch L-BFGS method for machine learning*
This paper shows how to perform stable quasi-Newton updating in the multi-batch setting, studies the convergence properties for both convex and non-convex functions, and illustrates the behaviour of the algorithm in a distributed computing platform on binary classification logistic regression and neural network training problems that arise in machine learning. Expand
Second-Order Optimization for Non-Convex Machine Learning: An Empirical Study
Detailed empirical evaluations of a class of Newton-type methods, namely sub-sampled variants of trust region (TR) and adaptive regularization with cubics (ARC) algorithms, for non-convex ML problems demonstrate that these methods not only can be computationally competitive with hand-tuned SGD with momentum, obtaining comparable or better generalization performance, but also they are highly robust to hyper-parameter settings. Expand
A Multi-Batch L-BFGS Method for Machine Learning
This paper shows how to perform stable quasi-Newton updating in the multi-batch setting, illustrates the behavior of the algorithm in a distributed computing platform, and studies its convergence properties for both the convex and nonconvex cases. Expand
A Stochastic Quasi-Newton Method for Large-Scale Optimization
A stochastic quasi-Newton method that is efficient, robust and scalable, and employs the classical BFGS update formula in its limited memory form, based on the observation that it is beneficial to collect curvature information pointwise, and at regular intervals, through (sub-sampled) Hessian-vector products. Expand
Efficient Distributed Hessian Free Algorithm for Large-scale Empirical Risk Minimization via Accumulating Sample Strategy
The proposed DANCE method is multistage in which the solution of a stage serves as a warm start for the next stage which contains more samples and reduces the number of passes over data to achieve the statistical accuracy of the full training set. Expand
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight. Expand
On the Use of Stochastic Hessian Information in Optimization Methods for Machine Learning
Curvature information is incorporated in two subsampled Hessian algorithms, one based on a matrix-free inexact Newton iteration and one on a preconditioned limited memory BFGS iteration. Expand
adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs
adaQN is presented, a stochastic quasi-Newton algorithm for training RNNs that retains a low per-iteration cost while allowing for non-diagonal scaling through a Stochastic L-BFGS updating scheme and is judicious in storing and retaining L- BFGS curvature pairs. Expand
Train faster, generalize better: Stability of stochastic gradient descent
We show that parametric models trained by a stochastic gradient method (SGM) with few iterations have vanishing generalization error. We prove our results by arguing that SGM is algorithmicallyExpand
A Stochastic Quasi-Newton Method for Online Convex Optimization
Stochastic variants of the wellknown BFGS quasi-Newton optimization method, in both full and memory-limited (LBFGS) forms, are developed for online optimization of convex functions, which asymptotically outperforms previous stochastic gradient methods for parameter estimation in conditional random fields. Expand