Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice

  title={Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice},
  author={Jeffrey Pennington and Samuel S. Schoenholz and Surya Ganguli},
It is well known that weight initialization in deep networks can have a dramatic impact on learning speed. For example, ensuring the mean squared singular value of a network’s input-output Jacobian is O(1) is essential for avoiding exponentially vanishing or exploding gradients. Moreover, in deep linear networks, ensuring that all singular values of the Jacobian are concentrated near 1 can yield a dramatic additional speed-up in learning; this is a property known as dynamical isometry. However… CONTINUE READING
Highly Cited
This paper has 40 citations. REVIEW CITATIONS
Recent Discussions
This paper has been referenced on Twitter 1 time over the past 90 days. VIEW TWEETS


Publications referenced by this paper.
Showing 1-10 of 21 references

Multiplicative functions on the lattice of non-crossing partitions and free convolution

  • Roland Speicher
  • Mathematische Annalen,
  • 1994
Highly Influential
9 Excerpts

Free random variables

  • Dan V Voiculescu, Ken J Dykema, Alexandru Nica
  • Number 1. American Mathematical Soc.,
  • 1992
Highly Influential
9 Excerpts

Similar Papers

Loading similar papers…