# Efficient Marginalization of Discrete and Structured Latent Variables via Sparsity

@article{Correia2020EfficientMO, title={Efficient Marginalization of Discrete and Structured Latent Variables via Sparsity}, author={Gonçalo M. Correia and Vlad Niculae and W. Aziz and Andr{\'e} F. T. Martins}, journal={ArXiv}, year={2020}, volume={abs/2007.01919} }

Training neural network models with discrete (categorical or structured) latent variables can be computationally challenging, due to the need for marginalization over large or combinatorial sets. To circumvent this issue, one typically resorts to sampling-based approximations of the true marginal, requiring noisy gradient estimators (e.g., score function estimator) or continuous relaxations with lower-variance reparameterized gradients (e.g., Gumbel-Softmax). In this paper, we propose a new… Expand

#### 7 Citations

Learning from Executions for Semantic Parsing

- Computer Science
- NAACL
- 2021

A set of new training objectives that are derived by approaching the problem of learning from executions from the posterior regularization perspective outperform conventional methods on Overnight and GeoQuery, bridging the gap between semi-supervised and supervised learning. Expand

A template for the arxiv style

- 2021

Neural networks and other machine learning models compute continuous representations, while humans communicate mostly through discrete symbols. Reconciling these two forms of communication is… Expand

Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions

- Computer Science
- ArXiv
- 2021

Experiments suggest that I-MLE is competitive with and often outperforms existing approaches which rely on problem-specific relaxations, and that it simplifies to maximum likelihood estimation when used in some recently studied learning settings that involve combinatorial solvers. Expand

On Finding the K-best Non-projective Dependency Trees

- Computer Science
- ACL/IJCNLP
- 2021

This paper provides a simplification of the K-best spanning tree algorithm of Camerini et al. (1980) that allows for a constant time speed-up over the original algorithm and presents a novel extension of the algorithm for decoding theK-best dependency trees of a graph which are subject to a root constraint. Expand

Reconciling the Discrete-Continuous Divide: Towards a Mathematical Theory of Sparse Communication

- Computer Science
- ArXiv
- 2021

A new entropy function that includes the discrete and differential entropies as particular cases, and has an interpretation in terms of code optimality, as well as two other information-theoretic counterparts that generalize the mutual information and Kullback-Leibler divergences are introduced. Expand

Storchastic: A Framework for General Stochastic Automatic Differentiation

- Mathematics, Computer Science
- ArXiv
- 2021

Storchastic is a new framework for automatic differentiation of stochastic computation graphs that allows the modeler to choose from a wide variety of gradient estimation methods at each sampling step, to optimally reduce the variance of the gradient estimates. Expand

Evidential Sparsification of Multimodal Latent Spaces in Conditional Variational Autoencoders

- Computer Science
- NeurIPS
- 2020

Experiments on diverse tasks, such as image generation and human behavior prediction, demonstrate the effectiveness of the proposed technique at reducing the discrete latent sample space size of a model while maintaining its learned multimodality. Expand

#### References

SHOWING 1-10 OF 77 REFERENCES

Auto-Encoding Variational Bayes

- Mathematics, Computer Science
- ICLR
- 2014

A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced. Expand

REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models

- Computer Science, Mathematics
- NIPS
- 2017

This work introduces a modification to the continuous relaxation of discrete variables and shows that the tightness of the relaxation can be adapted online, removing it as a hyperparameter, leading to faster convergence to a better final log-likelihood. Expand

Categorical Reparameterization with Gumbel-Softmax

- Mathematics, Computer Science
- ICLR
- 2017

It is shown that the Gumbel-Softmax estimator outperforms state-of-the-art gradient estimators on structured output prediction and unsupervised generative modeling tasks with categorical latent variables, and enables large speedups on semi-supervised classification. Expand

Neural Variational Inference and Learning in Belief Networks

- Computer Science, Mathematics
- ICML
- 2014

This work proposes a fast non-iterative approximate inference method that uses a feedforward network to implement efficient exact sampling from the variational posterior and shows that it outperforms the wake-sleep algorithm on MNIST and achieves state-of-the-art results on the Reuters RCV1 document dataset. Expand

A Tutorial on Deep Latent Variable Models of Natural Language

- Computer Science, Mathematics
- ArXiv
- 2018

This tutorial explores issues in depth through the lens of variational inference about how to parameterize conditional likelihoods in latent variable models with powerful function approximators. Expand

SparseMAP: Differentiable Sparse Structured Inference

- Computer Science, Mathematics
- ICML
- 2018

This work introduces SparseMAP, a new method for sparse structured inference, and its natural loss function, which reveals competitive accuracy, improved interpretability, and the ability to capture natural language ambiguities, which is attractive for pipeline systems. Expand

Interpretable Neural Predictions with Differentiable Binary Variables

- Computer Science
- ACL
- 2019

This work proposes a latent model that mixes discrete and continuous behaviour allowing at the same time for binary selections and gradient-based training without REINFORCE, and can tractably compute the expected value of penalties such as L0, which allows it to directly optimise the model towards a pre-specified text selection rate. Expand

Towards Dynamic Computation Graphs via Sparse Latent Structure

- Computer Science, Mathematics
- EMNLP
- 2018

This work proposes a novel approach for end-to-end learning of latent structure predictors jointly with a downstream predictor, and is the first to enable unrestricted dynamic computation graph construction from the global latent structure, while maintaining differentiability. Expand

Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

- Computer Science, Mathematics
- ArXiv
- 2013

This work considers a small-scale version of {\em conditional computation}, where sparse stochastic units form a distributed representation of gaters that can turn off in combinatorially many ways large chunks of the computation performed in the rest of the neural network. Expand

A Regularized Framework for Sparse and Structured Neural Attention

- Computer Science, Mathematics
- NIPS
- 2017

This paper proposes a new framework for sparse and structured attention, building upon a smoothed max operator, and shows that the gradient of this operator defines a mapping from real values to probabilities, suitable as an attention mechanism. Expand