Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

  title={Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift},
  author={Sergey Ioffe and Christian Szegedy},
Training Deep Neural Networks is complicated by the fact that the distribution of each layer’s inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from… CONTINUE READING
Highly Influential
This paper has highly influenced 411 other papers. REVIEW HIGHLY INFLUENTIAL CITATIONS
Highly Cited
This paper has 10,149 citations. REVIEW CITATIONS
Related Discussions
This paper has been referenced on Twitter 356 times. VIEW TWEETS


Publications citing this paper.
Showing 1-10 of 5,689 extracted citations

A Deep Convolutional Neural Network-Based Framework for Automatic Fetal Facial Standard Plane Recognition

IEEE Journal of Biomedical and Health Informatics • 2018
View 20 Excerpts
Highly Influenced

A Quantization-Friendly Separable Convolution for MobileNets

2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2) • 2018
View 11 Excerpts
Highly Influenced

Efficient Purely Convolutional Text Encoding

View 5 Excerpts
Highly Influenced

End-to-End Supervised Lung Lobe Segmentation

2018 International Joint Conference on Neural Networks (IJCNN) • 2018
View 7 Excerpts
Highly Influenced

Improving CNN Performance Accuracies With Min–Max Objective

IEEE Transactions on Neural Networks and Learning Systems • 2018
View 7 Excerpts
Highly Influenced

Learning Geo-Temporal Image Features

BMVC • 2018
View 8 Excerpts
Highly Influenced

10,149 Citations

Citations per Year
Semantic Scholar estimates that this publication has 10,149 citations based on the available data.

See our FAQ for additional information.


Publications referenced by this paper.
Showing 1-10 of 23 references

On the importance of initialization and momentum in deep learning

Sutskever, Ilya, +5 authors E Geoffrey
In ICML (3), • 2013
View 4 Excerpts
Highly Influenced

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

2015 IEEE International Conference on Computer Vision (ICCV) • 2015
View 2 Excerpts

Going deeper with convolutions

2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) • 2015

Mean-normalized stochastic gradient for large-scale deep learning

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 2014
View 1 Excerpt

On the difficulty of training recurrent neural networks

Pascanu, Razvan, +3 authors Yoshua
In Proceedings of the 30th International Conference on Machine Learning, • 2013
View 2 Excerpts

Similar Papers

Loading similar papers…