Corpus ID: 5808102

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

@inproceedings{Ioffe2015BatchNA,
  title={Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift},
  author={S. Ioffe and Christian Szegedy},
  booktitle={ICML},
  year={2015}
}
  • S. Ioffe, Christian Szegedy
  • Published in ICML 2015
  • Computer Science
  • Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. [...] Key Method Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization, and in some cases eliminates the need for Dropout. Applied to a…Expand Abstract
    19,715 Citations

    Figures and Topics from this paper.

    Explore Further: Topics Discussed in This Paper

    Batch Normalization: Is Learning An Adaptive Gain and Bias Necessary?
    • 2
    Accelerating Training of Deep Neural Networks with a Standardization Loss
    Internal Covariate Shift Reduction in Encoder-Decoder Convolutional Neural Networks
    • 1
    Mean Shift Rejection: Training Deep Neural Networks Without Minibatch Statistics or Normalization
    Training Deep Neural Networks Without Batch Normalization
    Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks
    • 84
    • Highly Influenced
    • PDF
    Training Faster by Separating Modes of Variation in Batch-Normalized Models
    • M. Kalayeh, M. Shah
    • Computer Science, Mathematics
    • IEEE Transactions on Pattern Analysis and Machine Intelligence
    • 2020
    • 9
    • Highly Influenced
    • PDF
    Layer Normalization
    • 1,626
    • PDF
    Batch-normalized Mlpconv-wise supervised pre-training network in network
    • 13

    References

    SHOWING 1-10 OF 33 REFERENCES
    Deep Learning Made Easier by Linear Transformations in Perceptrons
    • 162
    • PDF
    On the importance of initialization and momentum in deep learning
    • 2,581
    • PDF
    Understanding the difficulty of training deep feedforward neural networks
    • 8,855
    • PDF
    Mean-normalized stochastic gradient for large-scale deep learning
    • 58
    • PDF
    Natural Neural Networks
    • 125
    • PDF
    Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
    • 8,400
    • PDF
    Knowledge Matters: Importance of Prior Information for Optimization
    • 114
    • PDF
    Large Scale Distributed Deep Networks
    • 2,439
    • PDF