We propose a new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods.Expand

We introduce a novel machine learning framework based on recursive autoencoders for sentence-level prediction of sentiment label distributions and compare quantitatively against other methods on standard datasets and the EP dataset.Expand

We show that for wide neural networks the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from first-order Taylor expansion of the network around its initial parameters.Expand

We explore the dependence of the singular value distribution of a deep network's input-output Jacobian on the depth of the network, the weight initialization, and the choice of nonlinearity.Expand

In practice it is often found that large over-parameterized neural networks generalize better than smaller counterparts, an observation that appears to conflict with classical notions of function complexity, which typically favor smaller models.Expand

We show that it is possible to train vanilla CNNs with ten thousand layers or more simply by using an appropriate initialization scheme and demonstrate empirically that they enable efficient training of extremely deep architectures.Expand

We provide a principled framework for the initialization of weights and the choice of nonlinearities in order to produce well-conditioned Jacobians and fast learning.Expand

We develop a theory for signal propagation in recurrent networks after random initialization using a combination of mean field theory and random matrix theory and compare it with vanilla RNNs.Expand