New Insights and Perspectives on the Natural Gradient Method
@article{Martens2020NewIA, title={New Insights and Perspectives on the Natural Gradient Method}, author={J. Martens}, journal={J. Mach. Learn. Res.}, year={2020}, volume={21}, pages={146:1-146:76} }
Natural gradient descent is an optimization method traditionally motivated from the perspective of information geometry, and works well for many applications as an alternative to stochastic gradient descent. In this paper we critically analyze this method and its properties, and show how it can be viewed as a type of approximate 2nd-order optimization method, where the Fisher information matrix can be viewed as an approximation of the Hessian. This perspective turns out to have significant… CONTINUE READING
Supplemental Code
Github Repo
Via Papers with Code
Pytorch implementation of preconditioned stochastic gradient descent
Figures and Topics from this paper
Figures
171 Citations
A Formalization of The Natural Gradient Method for General Similarity Measures
- Mathematics, Computer Science
- GSI
- 2019
- 1
- PDF
Fast Convergence of Natural Gradient Descent for Overparameterized Neural Networks
- Computer Science, Mathematics
- NeurIPS
- 2019
- 28
- PDF
Exact natural gradient in deep linear networks and its application to the nonlinear case
- Computer Science, Mathematics
- NeurIPS
- 2018
- 21
- Highly Influenced
- PDF
Limitations of the empirical Fisher approximation for natural gradient descent
- Computer Science
- NeurIPS
- 2019
- 21
- PDF
Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks
- Computer Science, Mathematics
- NeurIPS
- 2020
- PDF
Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization
- Computer Science, Mathematics
- SDM
- 2020
- 10
- PDF
References
SHOWING 1-10 OF 92 REFERENCES
Fast Convergence of Natural Gradient Descent for Overparameterized Neural Networks
- Computer Science, Mathematics
- NeurIPS
- 2019
- 28
- PDF
Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles
- Mathematics, Computer Science
- J. Mach. Learn. Res.
- 2017
- 136
- PDF
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
- Computer Science, Mathematics
- NIPS
- 2014
- 863
- PDF
Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning
- Computer Science, Mathematics
- NIPS
- 2011
- 460
- PDF
Optimizing Neural Networks with Kronecker-factored Approximate Curvature
- Mathematics, Computer Science
- ICML
- 2015
- 381
- PDF
Momentum and Stochastic Momentum for Stochastic Gradient, Newton, Proximal Point and Subspace Descent Methods
- Mathematics, Computer Science
- Comput. Optim. Appl.
- 2020
- 69
- PDF
Hessian Matrix vs. Gauss-Newton Hessian Matrix
- Computer Science, Mathematics
- SIAM J. Numer. Anal.
- 2011
- 33
Krylov Subspace Descent for Deep Learning
- Computer Science, Mathematics
- AISTATS
- 2012
- 102
- Highly Influential
- PDF
Adam: A Method for Stochastic Optimization
- Computer Science, Mathematics
- ICLR
- 2015
- 56,769
- Highly Influential
- PDF
Adaptive natural gradient learning algorithms for various stochastic models
- Mathematics, Computer Science
- Neural Networks
- 2000
- 156
- Highly Influential