• Corpus ID: 2428314

Categorical Reparameterization with Gumbel-Softmax

@article{Jang2017CategoricalRW,
  title={Categorical Reparameterization with Gumbel-Softmax},
  author={Eric Jang and Shixiang Shane Gu and Ben Poole},
  journal={ArXiv},
  year={2017},
  volume={abs/1611.01144}
}
Categorical variables are a natural choice for representing discrete structure in the world. However, stochastic neural networks rarely use categorical latent variables due to the inability to backpropagate through samples. In this work, we present an efficient gradient estimator that replaces the non-differentiable sample from a categorical distribution with a differentiable sample from a novel Gumbel-Softmax distribution. This distribution has the essential property that it can be smoothly… 
Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces
TLDR
The Gumbel-Max trick is extended to define distributions over structured domains and a family of recursive algorithms with a common feature the authors call stochastic invariant is highlighted, which allows us to construct reliable gradient estimates and control variates without additional constraints on the model.
GumBolt: Extending Gumbel trick to Boltzmann priors
TLDR
The GumBolt is significantly simpler than the recently proposed methods with BM prior and outperforms them by a considerable margin, and achieves state-of-the-art performance on permutation invariant MNIST and OMNIGLOT datasets in the scope of models with only discrete latent variables.
Coarse Grained Exponential Variational Autoencoders
TLDR
This paper derives a semi-continuous latent representation, which approximates a continuous density up to a prescribed precision, and is much easier to analyze than its continuous counterpart because it is fundamentally discrete.
REINFORCing Concrete with REBAR
Learning in models with discrete latent variables is challenging due to high variance gradient estimators. Generally, approaches have relied on control variates to reduce the variance of the
A Review of the Gumbel-max Trick and its Extensions for Discrete Stochasticity in Machine Learning
TLDR
The goal of this survey article is to present background about the Gumbel-max trick, and to provide a structured overview of its extensions to ease algorithm selection, and presents a comprehensive outline of (machine learning) literature in which Gumbal-based algorithms have been leveraged.
Gaussian mixture models with Wasserstein distance
TLDR
This paper finds the discrete latent variable to be fully leveraged by the model when trained, without any modifications to the objective function or significant fine tuning.
Towards Hierarchical Discrete Variational Autoencoders
TLDR
The Hierarchical Discrete Variational Autoencoder (HD-VAE) is introduced: a hierarchy of variational memory layers and the Concrete/Gumbel-Softmax relaxation allows maximizing a surrogate of the Evidence Lower Bound by stochastic gradient ascent.
Relaxed Multivariate Bernoulli Distribution and Its Applications to Deep Generative Models
TLDR
A multivariate generalization of the Relaxed Bernoulli distribution is proposed, which can be reparameterized and can capture the correlation between variables via a Gaussian copula and demonstrate its effectiveness in two tasks: density estimation withBernoulli VAE and semisupervised multi-label classification.
GANS for Sequences of Discrete Elements with the Gumbel-softmax Distribution
TLDR
This work evaluates the performance of GANs based on recurrent neural networks with Gumbel-softmax output distributions in the task of generating sequences of discrete elements with a continuous approximation to a multinomial distribution parameterized in terms of the softmax function.
Coupled Gradient Estimators for Discrete Latent Variables
TLDR
Gradient estimators based on reparameterizing categorical variables as sequences of binary variables and Rao-Blackwellization are introduced and it is shown that these proposed categorical gradient estimators provide state-of-the-art performance.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 41 REFERENCES
Discrete Variational Autoencoders
  • J. Rolfe
  • Mathematics, Computer Science
    ICLR
  • 2017
TLDR
A novel method to train a class of probabilistic models with discrete latent variables using the variational autoencoder framework, including backpropagation through the discrete hidden variables, which outperforms state-of-the-art methods on the permutation-invariant MNIST, Omniglot, and Caltech-101 Silhouettes datasets.
Auto-Encoding Variational Bayes
TLDR
A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.
The Neural Autoregressive Distribution Estimator
TLDR
A new approach for modeling the distribution of high-dimensional vectors of discrete variables inspired by the restricted Boltzmann machine, which outperforms other multivariate binary distribution estimators on several datasets and performs similarly to a large (but intractable) RBM.
Neural Variational Inference and Learning in Belief Networks
TLDR
This work proposes a fast non-iterative approximate inference method that uses a feedforward network to implement efficient exact sampling from the variational posterior and shows that it outperforms the wake-sleep algorithm on MNIST and achieves state-of-the-art results on the Reuters RCV1 document dataset.
Deep AutoRegressive Networks
TLDR
An efficient approximate parameter estimation method based on the minimum description length (MDL) principle is derived, which can be seen as maximising a variational lower bound on the log-likelihood, with a feedforward neural network implementing approximate inference.
Regularizing Neural Networks by Penalizing Confident Output Distributions
TLDR
It is found that both label smoothing and the confidence penalty improve state-of-the-art models across benchmarks without modifying existing hyperparameters, suggesting the wide applicability of these regularizers.
MuProp: Unbiased Backpropagation for Stochastic Neural Networks
TLDR
MuProp is presented, an unbiased gradient estimator for stochastic networks, designed to make this task easier by improving on the likelihood-ratio estimator by reducing its variance using a control variate based on the first-order Taylor expansion of a mean-field network.
Stochastic Backpropagation and Approximate Inference in Deep Generative Models
We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference and
Variational Inference for Monte Carlo Objectives
TLDR
The first unbiased gradient estimator designed for importance-sampled objectives is developed, which is both simpler and more effective than the NVIL estimator proposed for the single-sample variational objective, and is competitive with the currently used biases.
The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables
TLDR
Concrete random variables---continuous relaxations of discrete random variables is a new family of distributions with closed form densities and a simple reparameterization, and the effectiveness of Concrete relaxations on density estimation and structured prediction tasks using neural networks is demonstrated.
...
1
2
3
4
5
...