# Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement

@inproceedings{Kool2019StochasticBA, title={Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement}, author={Wouter Kool and Herke van Hoof and Max Welling}, booktitle={ICML}, year={2019} }

The well-known Gumbel-Max trick for sampling from a categorical distribution can be extended to sample $k$ elements without replacement. [...] Key Method The algorithm creates a theoretical connection between sampling and (deterministic) beam search and can be used as a principled intermediate alternative. In a translation task, the proposed method compares favourably against alternatives to obtain diverse yet good quality translations. We show that sequences sampled without replacement can be used to construct… Expand

## 68 Citations

Ancestral Gumbel-Top-k Sampling for Sampling Without Replacement

- Computer Science, MathematicsJ. Mach. Learn. Res.
- 2020

We develop ancestral Gumbel-Top-k sampling: a generic and efficient method for sampling without replacement from discrete-valued Bayesian networks, which includes multivariate discrete distributions,…

Conditional Poisson Stochastic Beam Search

- Computer ScienceArXiv
- 2021

This work proposes a new method for turning beam search into a stochastic process: Conditional Poisson Stochastic beam search, and shows how samples generated under the CPSBS design can be used to build consistent estimators and sample diverse sets from sequence models.

A Review of the Gumbel-max Trick and its Extensions for Discrete Stochasticity in Machine Learning

- Computer Science, MathematicsArXiv
- 2021

The goal of this survey article is to present background about the Gumbel-max trick, and to provide a structured overview of its extensions to ease algorithm selection, and presents a comprehensive outline of (machine learning) literature in which Gumbal-based algorithms have been leveraged.

Reparameterizable Subset Sampling via Continuous Relaxations

- Computer Science, MathematicsIJCAI
- 2019

A continuous relaxation of subset sampling is defined that provides reparameterization gradients by generalizing the Gumbel-max trick and is used to sample subsets of features in an instance-wise feature selection task for model interpretability, and sub-sequences of neighbors to implement parametric t-SNE by directly comparing the identities of local neighbors.

Incremental Sampling Without Replacement for Sequence Models

- Computer Science, MathematicsICML
- 2020

It is shown that incremental sampling without replacement is applicable to many domains, e.g., program synthesis and combinatorial optimization, and is efficient even for exponentially-large output spaces.

Latent Template Induction with Gumbel-CRFs

- Computer Science, MathematicsNeurIPS
- 2020

This work proposes a Gumbel-CRF, a continuous relaxation of the CRF sampling algorithm using a relaxed Forward-Filtering Backward-Sampling (FFBS) approach, which gives more stable gradients than score-function based estimators and shows that it learns interpretable templates during training, which allows us to control the decoder during testing.

Determinantal Beam Search

- Computer ScienceACL/IJCNLP
- 2021

Determinantal beam search is proposed, a reformulation of beam search that offers competitive performance against other diverse set generation strategies in the context of language generation, while providing a more general approach to optimizing for diversity.

Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces

- Computer ScienceArXiv
- 2021

The Gumbel-Max trick is extended to define distributions over structured domains and a family of recursive algorithms with a common feature the authors call stochastic invariant is highlighted, which allows us to construct reliable gradient estimates and control variates without additional constraints on the model.

Estimating Gradients for Discrete Random Variables by Sampling without Replacement

- Computer Science, MathematicsICLR
- 2020

This work derives an unbiased estimator for expectations over discrete random variables based on sampling without replacement, which reduces variance as it avoids duplicate samples and is closely related to other gradient estimators.

Learning Optimal Tree Models Under Beam Search

- Mathematics, Computer ScienceICML
- 2020

A novel algorithm for learning optimal tree models under beam search is proposed and the rationality of the theoretical analysis is verified and the superiority of the algorithm compared to state-of-the-art methods is demonstrated.

## References

SHOWING 1-10 OF 52 REFERENCES

Lost Relatives of the Gumbel Trick

- Mathematics, Computer ScienceICML
- 2017

An entire family of related methods, of which the Gumbel trick is one member, are derived, and it is shown that the new methods have superior properties in several settings with minimal additional computational cost.

Embed and Project: Discrete Sampling with Universal Hashing

- Computer Science, MathematicsNIPS
- 2013

This work proposes a sampling algorithm, called PAWS, based on embedding the set into a higher-dimensional space which is then randomly projected using universal hash functions to a lower-dimensional subspace and explored using combinatorial search methods.

A Continuous Relaxation of Beam Search for End-to-end Training of Neural Sequence Models

- Computer Science, MathematicsAAAI
- 2018

This work proposes a new training procedure that focuses on the final loss metric (e.g. Hamming loss) evaluated on the output of beam search, and forms a sub-differentiable surrogate objective by introducing a novel continuous approximation of the beam search decoding procedure.

Randomized Optimum Models for Structured Prediction

- Computer ScienceAISTATS
- 2012

This work explores a broader class of models, called Randomized Optimum models (RandOMs), which include Perturb-and-MAP models, and develops likelihood-based learning algorithms for RandOMs, which, empirical results indicate, can produce better models than PM.

Perturb-and-MAP random fields: Using discrete optimization to learn and sample from energy models

- Computer Science, Mathematics2011 International Conference on Computer Vision
- 2011

A novel way to induce a random field from an energy function on discrete labels by locally injecting noise to the energy potentials, followed by finding the global minimum of the perturbed energy function is proposed.

Categorical Reparameterization with Gumbel-Softmax

- Mathematics, Computer ScienceICLR
- 2017

It is shown that the Gumbel-Softmax estimator outperforms state-of-the-art gradient estimators on structured output prediction and unsupervised generative modeling tasks with categorical latent variables, and enables large speedups on semi-supervised classification.

Sequence-to-Sequence Learning as Beam-Search Optimization

- Computer Science, MathematicsEMNLP
- 2016

This work introduces a model and beam-search training scheme, based on the work of Daume III and Marcu (2005), that extends seq2seq to learn global sequence scores and shows that this system outperforms a highly-optimized attention-basedseq2seq system and other baselines on three different sequence to sequence tasks: word ordering, parsing, and machine translation.

The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables

- Computer Science, MathematicsICLR
- 2017

Concrete random variables---continuous relaxations of discrete random variables is a new family of distributions with closed form densities and a simple reparameterization, and the effectiveness of Concrete relaxations on density estimation and structured prediction tasks using neural networks is demonstrated.

Classical Structured Prediction Losses for Sequence to Sequence Learning

- Computer ScienceNAACL
- 2018

A range of classical objective functions that have been widely used to train linear models for structured prediction and apply to neural sequence to sequence models are surveyed and show that these losses can perform surprisingly well by slightly outperforming beam search optimization in a like for like setup.

A* Sampling

- Computer Science, MathematicsNIPS
- 2014

This work shows how sampling from a continuous distribution can be converted into an optimization problem over continuous space and presents a new construction of the Gumbel process and A* Sampling, a practical generic sampling algorithm that searches for the maximum of a Gumbels process using A* search.