• Corpus ID: 28248429

Stochastic Backpropagation through Mixture Density Distributions

  title={Stochastic Backpropagation through Mixture Density Distributions},
  author={Alex Graves},
  • A. Graves
  • Published 19 July 2016
  • Computer Science
  • ArXiv
The ability to backpropagate stochastic gradients through continuous latent distributions has been crucial to the emergence of variational autoencoders and stochastic gradient variational Bayes. The key ingredient is an unbiased and low-variance way of estimating gradients with respect to distribution parameters from gradients evaluated at distribution samples. The "reparameterization trick" provides a class of transforms yielding such estimators for many continuous distributions, including the… 

Automatic Differentiation Variational Inference with Mixtures

This paper shows how stratified sampling may be used to enable mixture distributions as the approximate posterior, and derives a new lower bound on the evidence analogous to the importance weighted autoencoder (IWAE).

On the Variational Posterior of Dirichlet Process Deep Latent Gaussian Mixture Models

This paper presents an alternative treatment of the variational posterior of the Dirichlet Process Deep Latent Gaussian Mixture Model (DP-DLGMM), where it is shown that the prior cluster parameters and the Variational posteriors of the beta distributions and cluster hidden variables can be updated in closed-form.

Overdispersed variational autoencoders

The over Dispersed variational autoencoder and overdispersed importance weighted autoenCoder are introduced, which combine overdisPersed black box variational inference with the variational Autoencoding and importance weighted Autoencoders respectively.

Discrete Variational Autoencoders

A novel method to train a class of probabilistic models with discrete latent variables using the variational autoencoder framework, including backpropagation through the discrete hidden variables, which outperforms state-of-the-art methods on the permutation-invariant MNIST, Omniglot, and Caltech-101 Silhouettes datasets.

Monte Carlo Gradient Estimation in Machine Learning

A broad and accessible survey of the methods for Monte Carlo gradient estimation in machine learning and across the statistical sciences, exploring three strategies--the pathwise, score function, and measure-valued gradient estimators--exploring their historical developments, derivation, and underlying assumptions.

Doubly Reparameterized Gradient Estimators for Monte Carlo Objectives

A computationally efficient, unbiased drop-in gradient estimator that reduces the variance of the IWAE gradient, the reweighted wake-sleep update (RWS), and the jackknife variational inference (JVI) gradient (Nowozin, 2018).

Learning Sparse Neural Networks Through Mixture-Distributed Regularization

This paper proposes a more general framework for relaxing binary gates through mixture distributions, and introduces a reparameterization method for the smoothed binary gates drawn from mixture distributions to enable efficient gradient gradient-based optimization under the proposed deep learning algorithm.

Training Latent Variable Models with Auto-encoding Variational Bayes: A Tutorial

This tutorial focuses on motivating AEVB from the classic Expectation Maximization (EM) algorithm, as opposed to from deterministic auto-encoders, and derives from scratch the AevB training procedures of a non-deep and several deep latent variable models.

Bayesian Convolutional Neural Networks with Variational Inference

This work represents the extension of the group of Bayesian neural networks with variational inference which encompasses now all three types of network architectures, including convolutional neural networks, feedforward and recurrent networks.


A computationally efficient, unbiased drop-in gradient estimator that reduces the variance of the IWAE gradient, the reweighted wake-sleep update (RWS), and the jackknife variational inference (JVI) gradient (Nowozin, 2018).



Variational Dropout and the Local Reparameterization Trick

The Variational dropout method is proposed, a generalization of Gaussian dropout, but with a more flexibly parameterized posterior, often leading to better generalization in stochastic gradient variational Bayes.

Variational Dropout and the Local Reparameterization Trick

This work proposes variational dropout, a generalization of Gaussian dropout where the dropout rates are learned, often leading to better models, and allows inference of more flexibly parameterized posteriors.

Deep AutoRegressive Networks

An efficient approximate parameter estimation method based on the minimum description length (MDL) principle is derived, which can be seen as maximising a variational lower bound on the log-likelihood, with a feedforward neural network implementing approximate inference.

Practical Variational Inference for Neural Networks

This paper introduces an easy-to-implement stochastic variational method (or equivalently, minimum description length loss function) that can be applied to most neural networks and revisits several common regularisers from a variational perspective.

DRAW: A Recurrent Neural Network For Image Generation

The Deep Recurrent Attentive Writer neural network architecture for image generation substantially improves on the state of the art for generative models on MNIST, and, when trained on the Street View House Numbers dataset, it generates images that cannot be distinguished from real data with the naked eye.

Stochastic Backpropagation and Approximate Inference in Deep Generative Models

We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference and

Weight Uncertainty in Neural Networks

This work introduces a new, efficient, principled and backpropagation-compatible algorithm for learning a probability distribution on the weights of a neural network, called Bayes by Backprop, and shows how the learnt uncertainty in the weights can be used to improve generalisation in non-linear regression problems.

Auto-Encoding Variational Bayes

A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.