# The Normalization Method for Alleviating Pathological Sharpness in Wide Neural Networks

@article{Karakida2019TheNM, title={The Normalization Method for Alleviating Pathological Sharpness in Wide Neural Networks}, author={Ryo Karakida and S. Akaho and S. Amari}, journal={ArXiv}, year={2019}, volume={abs/1906.02926} }

Normalization methods play an important role in enhancing the performance of deep learning while their theoretical understandings have been limited. To theoretically elucidate the effectiveness of normalization, we quantify the geometry of the parameter space determined by the Fisher information matrix (FIM), which also corresponds to the local shape of the loss landscape under certain conditions. We analyze deep neural networks with random initialization, which is known to suffer from a… CONTINUE READING

8 Citations

Pathological spectra of the Fisher information metric and its variants in deep neural networks

- Mathematics, Physics
- 2019

- 2
- PDF

Theoretical Understanding of Batch-normalization: A Markov Chain Perspective

- Mathematics, Computer Science
- 2020

- 2

Any Target Function Exists in a Neighborhood of Any Sufficiently Wide Random Network: A Geometrical Perspective

- Computer Science, Mathematics
- 2020

- 3
- PDF

Group Whitening: Balancing Learning Efficiency and Representational Capacity

- Computer Science, Mathematics
- 2020

Normalization Techniques in Training DNNs: Methodology, Analysis and Application

- Computer Science, Mathematics
- 2020

The Spectrum of Fisher Information of Deep Networks Achieving Dynamical Isometry

- Computer Science, Mathematics
- 2020

Theoretical analysis of skip connections and batch normalization from generalization and optimization perspectives

- Computer Science
- 2020

Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks

- Computer Science, Mathematics
- 2020

#### References

##### Publications referenced by this paper.

SHOWING 1-10 OF 36 REFERENCES

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

- Mathematics, Computer Science
- 2015

- 19,195
- PDF

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

- Computer Science, Mathematics
- 2017

- 1,013
- PDF

How Does Batch Normalization Help Optimization?

- Computer Science, Mathematics
- 2018

- 350
- Highly Influential
- PDF

Empirical Analysis of the Hessian of Over-Parametrized Neural Networks

- Computer Science, Mathematics
- 2018

- 151
- PDF

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

- Computer Science, Mathematics
- 2019

- 226
- PDF

Exponential convergence rates for Batch Normalization: The power of length-direction decoupling in non-convex optimization

- Mathematics, Computer Science
- 2019

- 49
- PDF

Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks

- Mathematics, Computer Science
- 2018

- 136
- PDF