Variance-Preserving Initialization Schemes Improve Deep Network Training: But Which Variance is Preserved?

@article{Luther2019VariancePreservingIS,
  title={Variance-Preserving Initialization Schemes Improve Deep Network Training: But Which Variance is Preserved?},
  author={Kyle Luther and H. Sebastian Seung},
  journal={ArXiv},
  year={2019},
  volume={abs/1902.04942}
}
Before training a neural net, a classic rule of thumb is to randomly initialize the weights so that the variance of the preactivation is preserved across all layers. This is traditionally interpreted using the total variance due to randomness in both networks (weights) and samples. Alternatively, one can interpret the rule of thumb as preservation of the \emph{sample} mean and variance for a fixed network, i.e., preactivation statistics computed over the random sample of training samples. The… CONTINUE READING
1
Twitter Mention

References

Publications referenced by this paper.
SHOWING 1-10 OF 12 REFERENCES