• Computer Science, Mathematics
  • Published in ICML 2015

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

@article{Ioffe2015BatchNA,
  title={Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift},
  author={Sergey Ioffe and Christian Szegedy},
  journal={ArXiv},
  year={2015},
  volume={abs/1502.03167}
}
Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from… CONTINUE READING

Figures and Topics from this paper.

Citations

Publications citing this paper.
SHOWING 1-10 OF 10,080 CITATIONS

Backpropagation-Friendly Eigendecomposition

VIEW 4 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Latent Space Modelling of Unsteady Flow Subdomains

VIEW 7 EXCERPTS
CITES METHODS
HIGHLY INFLUENCED

FILTER CITATIONS BY YEAR

2013
2020

CITATION STATISTICS

  • 786 Highly Influenced Citations

  • Averaged 3,037 Citations per year from 2017 through 2019

References

Publications referenced by this paper.
SHOWING 1-10 OF 20 REFERENCES

On the importance of initialization and momentum in deep learning

Sutskever, Ilya, +5 authors E Geoffrey
  • In ICML (3),
  • 2013
VIEW 3 EXCERPTS
HIGHLY INFLUENTIAL

Mean-normalized stochastic gradient for large-scale deep learning

  • 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2014
VIEW 1 EXCERPT

On the difficulty of training recurrent neural networks

Pascanu, Razvan, +3 authors Yoshua
  • In Proceedings of the 30th International Conference on Machine Learning,
  • 2013
VIEW 2 EXCERPTS