Variance-Preserving Initialization Schemes Improve Deep Network Training: But Which Variance is Preserved?

@article{Luther2019VariancePreservingIS,
title={Variance-Preserving Initialization Schemes Improve Deep Network Training: But Which Variance is Preserved?},
author={Kyle Luther and H. Sebastian Seung},
journal={ArXiv},
year={2019},
volume={abs/1902.04942}
}
• Published in ArXiv 2019
Before training a neural net, a classic rule of thumb is to randomly initialize the weights so that the variance of the preactivation is preserved across all layers. This is traditionally interpreted using the total variance due to randomness in both networks (weights) and samples. Alternatively, one can interpret the rule of thumb as preservation of the \emph{sample} mean and variance for a fixed network, i.e., preactivation statistics computed over the random sample of training samples. The… CONTINUE READING
1

References

Publications referenced by this paper.
SHOWING 1-10 OF 12 REFERENCES

Exponential expressivity in deep neural networks through transient chaos

VIEW 5 EXCERPTS
HIGHLY INFLUENTIAL

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

• 2015 IEEE International Conference on Computer Vision (ICCV)
• 2015
VIEW 6 EXCERPTS
HIGHLY INFLUENTIAL

VIEW 1 EXCERPT

• ICLR
• 2015
VIEW 1 EXCERPT

• ICML
• 2015
VIEW 1 EXCERPT

• ICLR
• 2015
VIEW 2 EXCERPTS

• MICCAI
• 2015
VIEW 2 EXCERPTS

VIEW 1 EXCERPT

• AISTATS
• 2010