Krylov Subspace Descent for Deep Learning
@inproceedings{Vinyals2012KrylovSD, title={Krylov Subspace Descent for Deep Learning}, author={Oriol Vinyals and D. Povey}, booktitle={AISTATS}, year={2012} }
In this paper, we propose a second order optimization method to learn models where both the dimensionality of the parameter space and the number of training samples is high. In our method, we construct on each iteration a Krylov subspace formed by the gradient and an approximation to the Hessian matrix, and then use a subset of the training data samples to optimize over this subspace. As with the Hessian Free (HF) method of Martens (2010), the Hessian matrix is never explicitly constructed, and… CONTINUE READING
102 Citations
Accelerating Hessian-free optimization for Deep Neural Networks by implicit preconditioning and sampling
- Computer Science, Mathematics
- 2013 IEEE Workshop on Automatic Speech Recognition and Understanding
- 2013
- 20
- PDF
Practical Quasi-Newton Methods for Training Deep Neural Networks
- Computer Science, Mathematics
- NeurIPS
- 2020
- 3
- PDF
Revisiting Natural Gradient for Deep Networks
- Computer Science, Mathematics
- ICLR
- 2014
- 216
- Highly Influenced
- PDF
Distributed Hessian-Free Optimization for Deep Neural Network
- Computer Science, Mathematics
- AAAI Workshops
- 2017
- 10
- PDF
Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural Networks
- Computer Science, Mathematics
- ICLR
- 2018
- 9
- PDF
Large Scale Distributed Hessian-Free Optimization for Deep Neural Network
- Computer Science, Mathematics
- ArXiv
- 2016
- 9
- PDF
Second-order Neural Network Training Using Complex-step Directional Derivative
- Computer Science, Mathematics
- ArXiv
- 2020
- PDF
A Kronecker-factored approximate Fisher matrix for convolution layers
- Mathematics, Computer Science
- ICML
- 2016
- 81
- Highly Influenced
- PDF
Optimizing Neural Networks with Kronecker-factored Approximate Curvature
- Mathematics, Computer Science
- ICML
- 2015
- 381
- PDF
Distributed Second-Order Optimization using Kronecker-Factored Approximations
- Computer Science
- ICLR
- 2017
- 64
References
SHOWING 1-10 OF 20 REFERENCES
On the Use of Stochastic Hessian Information in Optimization Methods for Machine Learning
- Mathematics, Computer Science
- SIAM J. Optim.
- 2011
- 189
- PDF
Enriched Methods for Large-Scale Unconstrained Optimization
- Mathematics, Computer Science
- Comput. Optim. Appl.
- 2002
- 42
- PDF
Understanding the difficulty of training deep feedforward neural networks
- Computer Science, Mathematics
- AISTATS
- 2010
- 9,251
- PDF
Natural Gradient Works Efficiently in Learning
- Computer Science, Mathematics
- Neural Computation
- 1998
- 2,464
The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training
- Computer Science
- AISTATS
- 2009
- 348
- PDF