• Publications
  • Influence
Glove: Global Vectors for Word Representation
TLDR
We propose a new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods. Expand
  • 15,457
  • 2537
  • PDF
Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions
TLDR
We introduce a novel machine learning framework based on recursive autoencoders for sentence-level prediction of sentiment label distributions and compare quantitatively against other methods on standard datasets and the EP dataset. Expand
  • 1,122
  • 124
  • PDF
Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection
TLDR
We introduce a method for paraphrase detection based on recursive autoencoders (RAE) and learn feature vectors for phrases in syntactic trees. Expand
  • 811
  • 86
  • PDF
Deep Neural Networks as Gaussian Processes
TLDR
We derive the exact equivalence between infinitely wide deep networks and GPs in the limit of infinite network width. Expand
  • 310
  • 59
  • PDF
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
TLDR
We show that for wide neural networks the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from first-order Taylor expansion of the network around its initial parameters. Expand
  • 240
  • 45
  • PDF
Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice
TLDR
We explore the dependence of the singular value distribution of a deep network's input-output Jacobian on the depth of the network, the weight initialization, and the choice of nonlinearity. Expand
  • 127
  • 28
  • PDF
Sensitivity and Generalization in Neural Networks: an Empirical Study
TLDR
In practice it is often found that large over-parameterized neural networks generalize better than smaller counterparts, an observation that appears to conflict with classical notions of function complexity, which typically favor smaller models. Expand
  • 186
  • 17
  • PDF
Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks
TLDR
We show that it is possible to train vanilla CNNs with ten thousand layers or more simply by using an appropriate initialization scheme and demonstrate empirically that they enable efficient training of extremely deep architectures. Expand
  • 137
  • 12
  • PDF
The Emergence of Spectral Universality in Deep Networks
TLDR
We provide a principled framework for the initialization of weights and the choice of nonlinearities in order to produce well-conditioned Jacobians and fast learning. Expand
  • 73
  • 10
  • PDF
Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks
TLDR
We develop a theory for signal propagation in recurrent networks after random initialization using a combination of mean field theory and random matrix theory and compare it with vanilla RNNs. Expand
  • 69
  • 9
  • PDF