• Publications
  • Influence
Density estimation using Real NVP
This work extends the space of probabilistic models using real-valued non-volume preserving (real NVP) transformations, a set of powerful invertible and learnable transformations, resulting in an unsupervised learning algorithm with exact log-likelihood computation, exact sampling, exact inference of latent variables, and an interpretable latent space. Expand
Deep Knowledge Tracing
The utility of using Recurrent Neural Networks to model student learning and the learned model can be used for intelligent curriculum design and allows straightforward interpretation and discovery of structure in student tasks are explored. Expand
Unrolled Generative Adversarial Networks
This work introduces a method to stabilize Generative Adversarial Networks by defining the generator objective with respect to an unrolled optimization of the discriminator, and shows how this technique solves the common problem of mode collapse, stabilizes training of GANs with complex recurrent generators, and increases diversity and coverage of the data distribution by the generator. Expand
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
This work shows that for wide neural networks the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters. Expand
Exponential expressivity in deep neural networks through transient chaos
The theoretical analysis of the expressive power of deep networks broadly applies to arbitrary nonlinearities, and provides a quantitative underpinning for previously abstract notions about the geometry of deep functions. Expand
SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability
We propose a new technique, Singular Vector Canonical Correlation Analysis (SVCCA), a tool for quickly comparing two representations in a way that is both invariant to affine transform (allowingExpand
Deep Information Propagation
The presence of dropout destroys the order-to-chaos critical point and therefore strongly limits the maximum trainable depth for random networks, and a mean field theory for backpropagation is developed that shows that the ordered and chaotic phases correspond to regions of vanishing and exploding gradient respectively. Expand
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models
This work introduces a modification to the continuous relaxation of discrete variables and shows that the tightness of the relaxation can be adapted online, removing it as a hyperparameter, leading to faster convergence to a better final log-likelihood. Expand
Measuring the Effects of Data Parallelism on Neural Network Training
This work experimentally characterize the effects of increasing the batch size on training time, as measured by the number of steps necessary to reach a goal out-of-sample error, and study how this relationship varies with the training algorithm, model, and data set, and finds extremely large variation between workloads. Expand
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
This work develops an approach to systematically and slowly destroy structure in a data distribution through an iterative forward diffusion process, then learns a reverse diffusion process that restores structure in data, yielding a highly flexible and tractable generative model of the data. Expand