• Corpus ID: 76662039

Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement

  title={Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement},
  author={Wouter Kool and Herke van Hoof and Max Welling},
The well-known Gumbel-Max trick for sampling from a categorical distribution can be extended to sample $k$ elements without replacement. [] Key Method The algorithm creates a theoretical connection between sampling and (deterministic) beam search and can be used as a principled intermediate alternative. In a translation task, the proposed method compares favourably against alternatives to obtain diverse yet good quality translations. We show that sequences sampled without replacement can be used to construct…

Figures from this paper

Ancestral Gumbel-Top-k Sampling for Sampling Without Replacement

We develop ancestral Gumbel-Top- k sampling: a generic and efficient method for sampling without replacement from discrete-valued Bayesian networks, which includes multivariate discrete distributions,

Conditional Poisson Stochastic Beam Search

This work proposes a new method for turning beam search into a stochastic process: Conditional Poisson Stochastic beam search, and shows how samples generated under the CPSBS design can be used to build consistent estimators and sample diverse sets from sequence models.

Conditional Poisson Stochastic Beams

This work proposes a new method for turning beam search into a stochastic process: Conditional Poisson Stochastic beam search, and shows how samples generated under the CPSBS design can be used to build consistent estimators and sample diverse sets from sequence models.

A Review of the Gumbel-max Trick and its Extensions for Discrete Stochasticity in Machine Learning

The goal of this survey article is to present background about the Gumbel-max trick, and to provide a structured overview of its extensions to ease algorithm selection, and presents a comprehensive outline of (machine learning) literature in which Gumbal-based algorithms have been leveraged.

Reparameterizable Subset Sampling via Continuous Relaxations

A continuous relaxation of subset sampling is defined that provides reparameterization gradients by generalizing the Gumbel-max trick and is used to sample subsets of features in an instance-wise feature selection task for model interpretability, and sub-sequences of neighbors to implement parametric t-SNE by directly comparing the identities of local neighbors.

Incremental Sampling Without Replacement for Sequence Models

It is shown that incremental sampling without replacement is applicable to many domains, e.g., program synthesis and combinatorial optimization, and is efficient even for exponentially-large output spaces.

Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models

This work presents a framework for sampling according to an arithmetic code book implicitly defined by a large language model, compatible with common sampling variations, with provable beam diversity under certain conditions, as well as being embarrassingly parallel and providing unbiased and consistent expectations from the original model.

Latent Template Induction with Gumbel-CRFs

This work proposes a Gumbel-CRF, a continuous relaxation of the CRF sampling algorithm using a relaxed Forward-Filtering Backward-Sampling (FFBS) approach, which gives more stable gradients than score-function based estimators and shows that it learns interpretable templates during training, which allows us to control the decoder during testing.

Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces

The Gumbel-Max trick is extended to distributions over structured domains and a family of recursive algorithms with a common feature the authors call stochastic invariant is highlighted, which allows us to construct reliable gradient estimates and control variates without additional constraints on the model.

Truncation Sampling as Language Model Desmoothing

Long samples of text from neural language models can be of poor quality. Truncation sampling algorithms–like top- p or top- k —address this by setting some words’ probabilities to zero at each step.



Lost Relatives of the Gumbel Trick

An entire family of related methods, of which the Gumbel trick is one member, are derived, and it is shown that the new methods have superior properties in several settings with minimal additional computational cost.

Embed and Project: Discrete Sampling with Universal Hashing

This work proposes a sampling algorithm, called PAWS, based on embedding the set into a higher-dimensional space which is then randomly projected using universal hash functions to a lower-dimensional subspace and explored using combinatorial search methods.

A Continuous Relaxation of Beam Search for End-to-end Training of Neural Sequence Models

This work proposes a new training procedure that focuses on the final loss metric (e.g. Hamming loss) evaluated on the output of beam search, and forms a sub-differentiable surrogate objective by introducing a novel continuous approximation of the beam search decoding procedure.

Randomized Optimum Models for Structured Prediction

This work explores a broader class of models, called Randomized Optimum models (RandOMs), which include Perturb-and-MAP models, and develops likelihood-based learning algorithms for RandOMs, which, empirical results indicate, can produce better models than PM.

Perturb-and-MAP random fields: Using discrete optimization to learn and sample from energy models

A novel way to induce a random field from an energy function on discrete labels by locally injecting noise to the energy potentials, followed by finding the global minimum of the perturbed energy function is proposed.

Categorical Reparameterization with Gumbel-Softmax

It is shown that the Gumbel-Softmax estimator outperforms state-of-the-art gradient estimators on structured output prediction and unsupervised generative modeling tasks with categorical latent variables, and enables large speedups on semi-supervised classification.

Sequence-to-Sequence Learning as Beam-Search Optimization

This work introduces a model and beam-search training scheme, based on the work of Daume III and Marcu (2005), that extends seq2seq to learn global sequence scores and shows that this system outperforms a highly-optimized attention-basedseq2seq system and other baselines on three different sequence to sequence tasks: word ordering, parsing, and machine translation.

The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables

Concrete random variables---continuous relaxations of discrete random variables is a new family of distributions with closed form densities and a simple reparameterization, and the effectiveness of Concrete relaxations on density estimation and structured prediction tasks using neural networks is demonstrated.

Classical Structured Prediction Losses for Sequence to Sequence Learning

A range of classical objective functions that have been widely used to train linear models for structured prediction and apply to neural sequence to sequence models are surveyed and show that these losses can perform surprisingly well by slightly outperforming beam search optimization in a like for like setup.

A* Sampling

This work shows how sampling from a continuous distribution can be converted into an optimization problem over continuous space and presents a new construction of the Gumbel process and A* Sampling, a practical generic sampling algorithm that searches for the maximum of a Gumbels process using A* search.