• Computer Science, Mathematics
  • Published in CVPR 2018

Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift

@inproceedings{Li2018UnderstandingTD,
  title={Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift},
  author={Xiang Li and Shuo Chen and Xiaolin Hu and Jian Yang},
  booktitle={CVPR},
  year={2018}
}
This paper first answers the question "why do the two most powerful techniques Dropout and Batch Normalization (BN) often lead to a worse performance when they are combined together?" in both theoretical and statistical aspects. Theoretically, we find that Dropout would shift the variance of a specific neural unit when we transfer the state of that network from train to test. However, BN would maintain its statistical variance, which is accumulated from the entire learning procedure, in the… CONTINUE READING

Citations

Publications citing this paper.
SHOWING 1-10 OF 47 CITATIONS

Don't ignore Dropout in Fully Convolutional Networks

VIEW 16 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Rethinking the Usage of Batch Normalization and Dropout in the Training of Deep Neural Networks

VIEW 9 EXCERPTS
CITES METHODS, RESULTS & BACKGROUND
HIGHLY INFLUENCED

Mode Normalization

VIEW 2 EXCERPTS
CITES BACKGROUND
HIGHLY INFLUENCED

References

Publications referenced by this paper.
SHOWING 1-10 OF 34 REFERENCES

Aggregated Residual Transformations for Deep Neural Networks

  • 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
VIEW 7 EXCERPTS
HIGHLY INFLUENTIAL

Wide Residual Networks

VIEW 5 EXCERPTS
HIGHLY INFLUENTIAL

and L

G. Huang, Z. Liu, K. Q. Weinberger
  • van der Maaten. Densely connected convolutional networks. arXiv preprint arXiv:1608.06993
  • 2016
VIEW 5 EXCERPTS
HIGHLY INFLUENTIAL

Inception-v4

C. Szegedy, S. Ioffe, V. Vanhoucke, A. A. Alemi
  • inception-resnet and the impact of residual connections on learning. In AAAI, pages 4278–4284
  • 2017
VIEW 2 EXCERPTS