Corpus ID: 10284405

New Insights and Perspectives on the Natural Gradient Method

@article{Martens2020NewIA,
  title={New Insights and Perspectives on the Natural Gradient Method},
  author={J. Martens},
  journal={J. Mach. Learn. Res.},
  year={2020},
  volume={21},
  pages={146:1-146:76}
}
  • J. Martens
  • Published 2020
  • Computer Science, Mathematics
  • J. Mach. Learn. Res.
  • Natural gradient descent is an optimization method traditionally motivated from the perspective of information geometry, and works well for many applications as an alternative to stochastic gradient descent. In this paper we critically analyze this method and its properties, and show how it can be viewed as a type of approximate 2nd-order optimization method, where the Fisher information matrix can be viewed as an approximation of the Hessian. This perspective turns out to have significant… CONTINUE READING
    171 Citations
    New perspectives on the natural gradient method
    • 73
    A Formalization of The Natural Gradient Method for General Similarity Measures
    • 1
    • PDF
    Fast Convergence of Natural Gradient Descent for Overparameterized Neural Networks
    • 28
    • PDF
    Exact natural gradient in deep linear networks and its application to the nonlinear case
    • 21
    • Highly Influenced
    • PDF
    True Asymptotic Natural Gradient Optimization
    • 7
    • PDF
    Accelerating Natural Gradient with Higher-Order Invariance
    • 8
    • PDF
    Limitations of the Empirical Fisher Approximation
    • 19
    • PDF
    Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization
    • 10
    • PDF

    References

    SHOWING 1-10 OF 92 REFERENCES
    Fast Convergence of Natural Gradient Descent for Overparameterized Neural Networks
    • 28
    • PDF
    Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles
    • 136
    • PDF
    Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning
    • 460
    • PDF
    Optimizing Neural Networks with Kronecker-factored Approximate Curvature
    • 381
    • PDF
    Momentum and Stochastic Momentum for Stochastic Gradient, Newton, Proximal Point and Subspace Descent Methods
    • 69
    • PDF
    Hessian Matrix vs. Gauss-Newton Hessian Matrix
    • Pei Chen
    • Computer Science, Mathematics
    • SIAM J. Numer. Anal.
    • 2011
    • 33
    Krylov Subspace Descent for Deep Learning
    • 102
    • Highly Influential
    • PDF
    Adam: A Method for Stochastic Optimization
    • 56,769
    • Highly Influential
    • PDF
    Adaptive natural gradient learning algorithms for various stochastic models
    • 156
    • Highly Influential