• Publications
  • Influence
Importance Weighted Autoencoders
TLDR
The importance weighted autoencoder (IWAE), a generative model with the same architecture as the VAE, but which uses a strictly tighter log-likelihood lower bound derived from importance weighting, shows empirically that IWAEs learn richer latent space representations than VAEs, leading to improved test log- likelihood on density estimation benchmarks.
Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations
TLDR
The convolutional deep belief network is presented, a hierarchical generative model which scales to realistic image sizes and is translation-invariant and supports efficient bottom-up and top-down probabilistic inference.
Isolating Sources of Disentanglement in Variational Autoencoders
We decompose the evidence lower bound to show the existence of a term measuring the total correlation between latent variables. We use this to motivate our $\beta$-TCVAE (Total Correlation
Optimizing Neural Networks with Kronecker-factored Approximate Curvature
TLDR
K-FAC is an efficient method for approximating natural gradient descent in neural networks which is based on an efficiently invertible approximation of a neural network's Fisher information matrix which is neither diagonal nor low-rank, and in some cases is completely non-sparse.
Ground truth dataset and baseline evaluations for intrinsic image algorithms
TLDR
This work presents a ground-truth dataset of intrinsic image decompositions for a variety of real-world objects, and uses this dataset to quantitatively compare several existing algorithms.
Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
TLDR
This work proposes to apply trust region optimization to deep reinforcement learning using a recently proposed Kronecker-factored approximation to the curvature with trust region, which is the first scalable trust region natural gradient method for actor-critic methods.
Picking Winning Tickets Before Training by Preserving Gradient Flow
TLDR
This work argues that efficient training requires preserving the gradient flow through the network, and proposes a simple but effective pruning criterion called Gradient Signal Preservation (GraSP), which achieves significantly better performance than the baseline at extreme sparsity levels.
Structure Discovery in Nonparametric Regression through Compositional Kernel Search
TLDR
This work defines a space of kernel structures which are built compositionally by adding and multiplying a small number of base kernels, and presents a method for searching over this space of structures which mirrors the scientific discovery process.
The Reversible Residual Network: Backpropagation Without Storing Activations
TLDR
The Reversible Residual Network (RevNet) is presented, a variant of ResNets where each layer's activations can be reconstructed exactly from the next layer's, therefore, the activations for most layers need not be stored in memory during backpropagation.
Automatic Construction and Natural-Language Description of Nonparametric Regression Models
TLDR
The beginnings of an automatic statistician is presented, focusing on regression problems, which explores an open-ended space of statistical models to discover a good explanation of a data set, and then produces a detailed report with figures and natural-language text.
...
...