# Categorical Reparameterization with Gumbel-Softmax

@article{Jang2017CategoricalRW, title={Categorical Reparameterization with Gumbel-Softmax}, author={Eric Jang and Shixiang Shane Gu and Ben Poole}, journal={ArXiv}, year={2017}, volume={abs/1611.01144} }

Categorical variables are a natural choice for representing discrete structure in the world. However, stochastic neural networks rarely use categorical latent variables due to the inability to backpropagate through samples. In this work, we present an efficient gradient estimator that replaces the non-differentiable sample from a categorical distribution with a differentiable sample from a novel Gumbel-Softmax distribution. This distribution has the essential property that it can be smoothly…

## Figures, Tables, and Topics from this paper

## 2,346 Citations

Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces

- Computer ScienceArXiv
- 2021

The Gumbel-Max trick is extended to define distributions over structured domains and a family of recursive algorithms with a common feature the authors call stochastic invariant is highlighted, which allows us to construct reliable gradient estimates and control variates without additional constraints on the model.

GumBolt: Extending Gumbel trick to Boltzmann priors

- Computer Science, MathematicsNeurIPS
- 2018

The GumBolt is significantly simpler than the recently proposed methods with BM prior and outperforms them by a considerable margin, and achieves state-of-the-art performance on permutation invariant MNIST and OMNIGLOT datasets in the scope of models with only discrete latent variables.

Coarse Grained Exponential Variational Autoencoders

- Computer Science, MathematicsArXiv
- 2017

This paper derives a semi-continuous latent representation, which approximates a continuous density up to a prescribed precision, and is much easier to analyze than its continuous counterpart because it is fundamentally discrete.

REINFORCing Concrete with REBAR

- 2017

Learning in models with discrete latent variables is challenging due to high variance gradient estimators. Generally, approaches have relied on control variates to reduce the variance of the…

A Review of the Gumbel-max Trick and its Extensions for Discrete Stochasticity in Machine Learning

- Computer Science, MathematicsArXiv
- 2021

The goal of this survey article is to present background about the Gumbel-max trick, and to provide a structured overview of its extensions to ease algorithm selection, and presents a comprehensive outline of (machine learning) literature in which Gumbal-based algorithms have been leveraged.

Gaussian mixture models with Wasserstein distance

- Mathematics, Computer ScienceArXiv
- 2018

This paper finds the discrete latent variable to be fully leveraged by the model when trained, without any modifications to the objective function or significant fine tuning.

Towards Hierarchical Discrete Variational Autoencoders

- Computer Science
- 2019

The Hierarchical Discrete Variational Autoencoder (HD-VAE) is introduced: a hierarchy of variational memory layers and the Concrete/Gumbel-Softmax relaxation allows maximizing a surrogate of the Evidence Lower Bound by stochastic gradient ascent.

Relaxed Multivariate Bernoulli Distribution and Its Applications to Deep Generative Models

- Computer Science, MathematicsUAI
- 2020

A multivariate generalization of the Relaxed Bernoulli distribution is proposed, which can be reparameterized and can capture the correlation between variables via a Gaussian copula and demonstrate its effectiveness in two tasks: density estimation withBernoulli VAE and semisupervised multi-label classification.

GANS for Sequences of Discrete Elements with the Gumbel-softmax Distribution

- Computer Science, MathematicsArXiv
- 2016

This work evaluates the performance of GANs based on recurrent neural networks with Gumbel-softmax output distributions in the task of generating sequences of discrete elements with a continuous approximation to a multinomial distribution parameterized in terms of the softmax function.

Coupled Gradient Estimators for Discrete Latent Variables

- Computer Science, MathematicsArXiv
- 2021

Gradient estimators based on reparameterizing categorical variables as sequences of binary variables and Rao-Blackwellization are introduced and it is shown that these proposed categorical gradient estimators provide state-of-the-art performance.

## References

SHOWING 1-10 OF 41 REFERENCES

Discrete Variational Autoencoders

- Mathematics, Computer ScienceICLR
- 2017

A novel method to train a class of probabilistic models with discrete latent variables using the variational autoencoder framework, including backpropagation through the discrete hidden variables, which outperforms state-of-the-art methods on the permutation-invariant MNIST, Omniglot, and Caltech-101 Silhouettes datasets.

Auto-Encoding Variational Bayes

- Mathematics, Computer ScienceICLR
- 2014

A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.

The Neural Autoregressive Distribution Estimator

- Mathematics, Computer ScienceAISTATS
- 2011

A new approach for modeling the distribution of high-dimensional vectors of discrete variables inspired by the restricted Boltzmann machine, which outperforms other multivariate binary distribution estimators on several datasets and performs similarly to a large (but intractable) RBM.

Neural Variational Inference and Learning in Belief Networks

- Computer Science, MathematicsICML
- 2014

This work proposes a fast non-iterative approximate inference method that uses a feedforward network to implement efficient exact sampling from the variational posterior and shows that it outperforms the wake-sleep algorithm on MNIST and achieves state-of-the-art results on the Reuters RCV1 document dataset.

Deep AutoRegressive Networks

- Computer Science, MathematicsICML
- 2014

An efficient approximate parameter estimation method based on the minimum description length (MDL) principle is derived, which can be seen as maximising a variational lower bound on the log-likelihood, with a feedforward neural network implementing approximate inference.

Regularizing Neural Networks by Penalizing Confident Output Distributions

- Computer ScienceICLR
- 2017

It is found that both label smoothing and the confidence penalty improve state-of-the-art models across benchmarks without modifying existing hyperparameters, suggesting the wide applicability of these regularizers.

MuProp: Unbiased Backpropagation for Stochastic Neural Networks

- Computer Science, MathematicsICLR
- 2016

MuProp is presented, an unbiased gradient estimator for stochastic networks, designed to make this task easier by improving on the likelihood-ratio estimator by reducing its variance using a control variate based on the first-order Taylor expansion of a mean-field network.

Stochastic Backpropagation and Approximate Inference in Deep Generative Models

- Computer Science, MathematicsICML
- 2014

We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference and…

Variational Inference for Monte Carlo Objectives

- Computer Science, MathematicsICML
- 2016

The first unbiased gradient estimator designed for importance-sampled objectives is developed, which is both simpler and more effective than the NVIL estimator proposed for the single-sample variational objective, and is competitive with the currently used biases.

The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables

- Computer Science, MathematicsICLR
- 2017

Concrete random variables---continuous relaxations of discrete random variables is a new family of distributions with closed form densities and a simple reparameterization, and the effectiveness of Concrete relaxations on density estimation and structured prediction tasks using neural networks is demonstrated.