Corpus ID: 10284405

New Insights and Perspectives on the Natural Gradient Method

  title={New Insights and Perspectives on the Natural Gradient Method},
  author={J. Martens},
  journal={J. Mach. Learn. Res.},
  • J. Martens
  • Published 2020
  • Computer Science, Mathematics
  • J. Mach. Learn. Res.
  • Natural gradient descent is an optimization method traditionally motivated from the perspective of information geometry, and works well for many applications as an alternative to stochastic gradient descent. In this paper we critically analyze this method and its properties, and show how it can be viewed as a type of approximate 2nd-order optimization method, where the Fisher information matrix can be viewed as an approximation of the Hessian. This perspective turns out to have significant… CONTINUE READING
    168 Citations
    New perspectives on the natural gradient method
    • 102
    A Formalization of The Natural Gradient Method for General Similarity Measures
    • 1
    • PDF
    Fast Convergence of Natural Gradient Descent for Overparameterized Neural Networks
    • 26
    • PDF
    Exact natural gradient in deep linear networks and its application to the nonlinear case
    • 21
    • Highly Influenced
    • PDF
    True Asymptotic Natural Gradient Optimization
    • 6
    • PDF
    Accelerating Natural Gradient with Higher-Order Invariance
    • 6
    • PDF
    Limitations of the Empirical Fisher Approximation
    • 18
    • PDF
    Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization
    • 10
    • PDF


    Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles
    • 133
    • PDF
    Optimizing Neural Networks with Kronecker-factored Approximate Curvature
    • 359
    • PDF
    Krylov Subspace Descent for Deep Learning
    • 101
    • Highly Influential
    • PDF
    Adam: A Method for Stochastic Optimization
    • 52,840
    • PDF
    Adaptive natural gradient learning algorithms for various stochastic models
    • 155
    • Highly Influential
    Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
    • 6,217
    • PDF
    Scaling up Natural Gradient by Sparsely Factorizing the Inverse Fisher Matrix
    • 54
    • PDF
    From Averaging to Acceleration, There is Only a Step-size
    • 90
    • PDF
    Revisiting Natural Gradient for Deep Networks
    • 211
    • PDF
    Introductory Lectures on Convex Optimization - A Basic Course
    • 4,028