• Publications
  • Influence
GloVe: Global Vectors for Word Representation
TLDR
A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure. Expand
Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions
TLDR
A novel machine learning framework based on recursive autoencoders for sentence-level prediction of sentiment label distributions that outperform other state-of-the-art approaches on commonly used datasets, without using any pre-defined sentiment lexica or polarity shifting rules. Expand
Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection
TLDR
This work introduces a method for paraphrase detection based on recursive autoencoders (RAE) and unsupervised RAEs based on a novel unfolding objective and learns feature vectors for phrases in syntactic trees to measure word- and phrase-wise similarity between two sentences. Expand
Deep Neural Networks as Gaussian Processes
TLDR
The exact equivalence between infinitely wide deep networks and GPs is derived and it is found that test performance increases as finite-width trained networks are made wider and more similar to a GP, and thus that GP predictions typically outperform those of finite- width networks. Expand
Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice
TLDR
This work uses powerful tools from free probability theory to compute analytically the entire singular value distribution of a deep network's input-output Jacobian, and reveals that controlling the entire distribution of Jacobian singular values is an important design consideration in deep learning. Expand
Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks
TLDR
This work demonstrates that it is possible to train vanilla CNNs with ten thousand layers or more simply by using an appropriate initialization scheme, and presents an algorithm for generating such random initial orthogonal convolution kernels. Expand
Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes
TLDR
This work derives an analogous equivalence for multi-layer convolutional neural networks (CNNs) both with and without pooling layers, and introduces a Monte Carlo method to estimate the GP corresponding to a given neural network architecture, even in cases where the analytic form has too many terms to be computationally feasible. Expand
Sensitivity and Generalization in Neural Networks: an Empirical Study
TLDR
It is found that trained neural networks are more robust to input perturbations in the vicinity of the training data manifold, as measured by the norm of the input-output Jacobian of the network, and that it correlates well with generalization. Expand
The Emergence of Spectral Universality in Deep Networks
TLDR
This work uses powerful tools from free probability theory to provide a detailed analytic understanding of how a deep network's Jacobian spectrum depends on various hyperparameters including the nonlinearity, the weight and bias distributions, and the depth. Expand
Geometry of Neural Network Loss Surfaces via Random Matrix Theory
TLDR
An analytical framework and a set of tools from random matrix theory that allow us to compute an approximation of the distribution of eigenvalues of the Hessian matrix at critical points of varying energy are introduced. Expand
...
1
2
3
4
5
...