Corpus ID: 27755638

Practical Gauss-Newton Optimisation for Deep Learning

@inproceedings{Botev2017PracticalGO,
  title={Practical Gauss-Newton Optimisation for Deep Learning},
  author={Aleksandar Botev and Hippolyt Ritter and D. Barber},
  booktitle={ICML},
  year={2017}
}
We present an efficient block-diagonal ap- proximation to the Gauss-Newton matrix for feedforward neural networks. Our result- ing algorithm is competitive against state- of-the-art first order optimisation methods, with sometimes significant improvement in optimisation performance. Unlike first-order methods, for which hyperparameter tuning of the optimisation parameters is often a labo- rious process, our approach can provide good performance even when used with default set- tings. A side… Expand
A Gram-Gauss-Newton Method Learning Overparameterized Deep Neural Networks for Regression Problems
Deep Frank-Wolfe For Neural Network Optimization
Small Steps and Giant Leaps: Minimal Newton Solvers for Deep Learning
GRAM-GAUSS-NEWTON METHOD: LEARNING OVER-
  • 2019
Structured Stochastic Quasi-Newton Methods for Large-Scale Optimization Problems
A straightforward line search approach on the expected empirical loss for stochastic deep learning problems
Towards a Mathematical Understanding of the Difficulty in Learning with Feedforward Neural Networks
  • Hao Shen
  • Computer Science
  • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
  • 2018
Kronecker-factored Quasi-Newton Methods for Convolutional Neural Networks
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 23 REFERENCES
Deep learning via Hessian-free optimization
Learning Recurrent Neural Networks with Hessian-Free Optimization
Optimizing Neural Networks with Kronecker-factored Approximate Curvature
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
Adam: A Method for Stochastic Optimization
No more pesky learning rates
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Factoring Variations in Natural Images with Deep Gaussian Mixture Models
Natural Gradient Works Efficiently in Learning
  • S. Amari
  • Computer Science, Mathematics
  • Neural Computation
  • 1998
...
1
2
3
...