# Stochastic Backpropagation through Mixture Density Distributions

@article{Graves2016StochasticBT, title={Stochastic Backpropagation through Mixture Density Distributions}, author={Alex Graves}, journal={ArXiv}, year={2016}, volume={abs/1607.05690} }

The ability to backpropagate stochastic gradients through continuous latent distributions has been crucial to the emergence of variational autoencoders and stochastic gradient variational Bayes. The key ingredient is an unbiased and low-variance way of estimating gradients with respect to distribution parameters from gradients evaluated at distribution samples. The "reparameterization trick" provides a class of transforms yielding such estimators for many continuous distributions, including the…

## 36 Citations

### Automatic Differentiation Variational Inference with Mixtures

- Computer ScienceAISTATS
- 2021

This paper shows how stratified sampling may be used to enable mixture distributions as the approximate posterior, and derives a new lower bound on the evidence analogous to the importance weighted autoencoder (IWAE).

### On the Variational Posterior of Dirichlet Process Deep Latent Gaussian Mixture Models

- Computer ScienceICML 2020
- 2020

This paper presents an alternative treatment of the variational posterior of the Dirichlet Process Deep Latent Gaussian Mixture Model (DP-DLGMM), where it is shown that the prior cluster parameters and the Variational posteriors of the beta distributions and cluster hidden variables can be updated in closed-form.

### Overdispersed variational autoencoders

- Computer Science2017 International Joint Conference on Neural Networks (IJCNN)
- 2017

The over Dispersed variational autoencoder and overdispersed importance weighted autoenCoder are introduced, which combine overdisPersed black box variational inference with the variational Autoencoding and importance weighted Autoencoders respectively.

### Discrete Variational Autoencoders

- Computer ScienceICLR
- 2017

A novel method to train a class of probabilistic models with discrete latent variables using the variational autoencoder framework, including backpropagation through the discrete hidden variables, which outperforms state-of-the-art methods on the permutation-invariant MNIST, Omniglot, and Caltech-101 Silhouettes datasets.

### Monte Carlo Gradient Estimation in Machine Learning

- Computer ScienceJ. Mach. Learn. Res.
- 2020

A broad and accessible survey of the methods for Monte Carlo gradient estimation in machine learning and across the statistical sciences, exploring three strategies--the pathwise, score function, and measure-valued gradient estimators--exploring their historical developments, derivation, and underlying assumptions.

### Doubly Reparameterized Gradient Estimators for Monte Carlo Objectives

- Computer ScienceICLR
- 2019

A computationally efficient, unbiased drop-in gradient estimator that reduces the variance of the IWAE gradient, the reweighted wake-sleep update (RWS), and the jackknife variational inference (JVI) gradient (Nowozin, 2018).

### Learning Sparse Neural Networks Through Mixture-Distributed Regularization

- Computer Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
- 2020

This paper proposes a more general framework for relaxing binary gates through mixture distributions, and introduces a reparameterization method for the smoothed binary gates drawn from mixture distributions to enable efficient gradient gradient-based optimization under the proposed deep learning algorithm.

### Training Latent Variable Models with Auto-encoding Variational Bayes: A Tutorial

- Computer ScienceArXiv
- 2022

This tutorial focuses on motivating AEVB from the classic Expectation Maximization (EM) algorithm, as opposed to from deterministic auto-encoders, and derives from scratch the AevB training procedures of a non-deep and several deep latent variable models.

### Bayesian Convolutional Neural Networks with Variational Inference

- Computer Science
- 2018

This work represents the extension of the group of Bayesian neural networks with variational inference which encompasses now all three types of network architectures, including convolutional neural networks, feedforward and recurrent networks.

### DOUBLY REPARAMETERIZED GRADIENT ESTIMATORS

- Computer Science
- 2018

A computationally efficient, unbiased drop-in gradient estimator that reduces the variance of the IWAE gradient, the reweighted wake-sleep update (RWS), and the jackknife variational inference (JVI) gradient (Nowozin, 2018).

## References

SHOWING 1-8 OF 8 REFERENCES

### Variational Dropout and the Local Reparameterization Trick

- Computer ScienceNIPS
- 2015

The Variational dropout method is proposed, a generalization of Gaussian dropout, but with a more flexibly parameterized posterior, often leading to better generalization in stochastic gradient variational Bayes.

### Variational Dropout and the Local Reparameterization Trick

- Computer ScienceNIPS 2015
- 2015

This work proposes variational dropout, a generalization of Gaussian dropout where the dropout rates are learned, often leading to better models, and allows inference of more flexibly parameterized posteriors.

### Deep AutoRegressive Networks

- Computer ScienceICML
- 2014

An efficient approximate parameter estimation method based on the minimum description length (MDL) principle is derived, which can be seen as maximising a variational lower bound on the log-likelihood, with a feedforward neural network implementing approximate inference.

### Practical Variational Inference for Neural Networks

- Computer ScienceNIPS
- 2011

This paper introduces an easy-to-implement stochastic variational method (or equivalently, minimum description length loss function) that can be applied to most neural networks and revisits several common regularisers from a variational perspective.

### DRAW: A Recurrent Neural Network For Image Generation

- Computer ScienceICML
- 2015

The Deep Recurrent Attentive Writer neural network architecture for image generation substantially improves on the state of the art for generative models on MNIST, and, when trained on the Street View House Numbers dataset, it generates images that cannot be distinguished from real data with the naked eye.

### Stochastic Backpropagation and Approximate Inference in Deep Generative Models

- Computer ScienceICML
- 2014

We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference and…

### Weight Uncertainty in Neural Networks

- Computer ScienceArXiv
- 2015

This work introduces a new, efficient, principled and backpropagation-compatible algorithm for learning a probability distribution on the weights of a neural network, called Bayes by Backprop, and shows how the learnt uncertainty in the weights can be used to improve generalisation in non-linear regression problems.

### Auto-Encoding Variational Bayes

- Computer ScienceICLR
- 2014

A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.