Training Faster by Separating Modes of Variation in Batch-Normalized Models
@article{Kalayeh2020TrainingFB, title={Training Faster by Separating Modes of Variation in Batch-Normalized Models}, author={M. Kalayeh and M. Shah}, journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, year={2020}, volume={42}, pages={1483-1500} }
Batch Normalization (BN) is essential to effectively train state-of-the-art deep Convolutional Neural Networks (CNN). It normalizes the layer outputs during training using the statistics of each mini-batch. BN accelerates training procedure by allowing to safely utilize large learning rates and alleviates the need for careful initialization of the parameters. In this work, we study BN from the viewpoint of Fisher kernels that arise from generative probability models. We show that assuming… CONTINUE READING
Figures, Tables, and Topics from this paper
9 Citations
Regularizing activations in neural networks via distribution matching with the Wasserstein metric
- Computer Science, Mathematics
- ICLR
- 2020
- 3
- PDF
Split Batch Normalization: Improving Semi-Supervised Learning under Domain Shift
- Mathematics, Computer Science
- ArXiv
- 2019
- 4
- PDF
Normalization Techniques in Training DNNs: Methodology, Analysis and Application
- Computer Science, Mathematics
- ArXiv
- 2020
- PDF
References
SHOWING 1-10 OF 57 REFERENCES
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
- Computer Science
- ICML
- 2015
- 20,767
- Highly Influential
- PDF
Normalizing the Normalizers: Comparing and Extending Network Normalization Schemes
- Computer Science, Mathematics
- ICLR
- 2017
- 52
- Highly Influential
- PDF
Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models
- Computer Science, Mathematics
- NIPS
- 2017
- 253
- Highly Influential
- PDF
Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks
- Computer Science, Mathematics
- NIPS
- 2016
- 935
- PDF
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
- Computer Science, Mathematics
- ICLR
- 2017
- 1,110
- PDF
Learning Multiple Layers of Features from Tiny Images
- Computer Science
- 2009
- 10,001
- Highly Influential
- PDF