Corpus ID: 174801305

The Normalization Method for Alleviating Pathological Sharpness in Wide Neural Networks

@article{Karakida2019TheNM,
  title={The Normalization Method for Alleviating Pathological Sharpness in Wide Neural Networks},
  author={Ryo Karakida and S. Akaho and S. Amari},
  journal={ArXiv},
  year={2019},
  volume={abs/1906.02926}
}
  • Ryo Karakida, S. Akaho, S. Amari
  • Published 2019
  • Computer Science, Mathematics, Physics
  • ArXiv
  • Normalization methods play an important role in enhancing the performance of deep learning while their theoretical understandings have been limited. To theoretically elucidate the effectiveness of normalization, we quantify the geometry of the parameter space determined by the Fisher information matrix (FIM), which also corresponds to the local shape of the loss landscape under certain conditions. We analyze deep neural networks with random initialization, which is known to suffer from a… CONTINUE READING

    Figures and Topics from this paper.

    Pathological spectra of the Fisher information metric and its variants in deep neural networks
    • 2
    • PDF
    Any Target Function Exists in a Neighborhood of Any Sufficiently Wide Random Network: A Geometrical Perspective
    • S. Amari
    • Computer Science, Mathematics
    • 2020
    • 3
    • PDF
    Group Whitening: Balancing Learning Efficiency and Representational Capacity
    Normalization Techniques in Training DNNs: Methodology, Analysis and Application
    The Spectrum of Fisher Information of Deep Networks Achieving Dynamical Isometry

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 36 REFERENCES
    Understanding Batch Normalization
    • 134
    • Highly Influential
    • PDF
    Visualizing the Loss Landscape of Neural Nets
    • 395
    • PDF
    How Does Batch Normalization Help Optimization?
    • 350
    • Highly Influential
    • PDF
    On Exact Computation with an Infinitely Wide Neural Net
    • 207
    • PDF
    Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
    • 226
    • PDF