Corpus ID: 220363570

# Efficient Marginalization of Discrete and Structured Latent Variables via Sparsity

@article{Correia2020EfficientMO,
title={Efficient Marginalization of Discrete and Structured Latent Variables via Sparsity},
author={Gonçalo M. Correia and Vlad Niculae and W. Aziz and Andr{\'e} F. T. Martins},
journal={ArXiv},
year={2020},
volume={abs/2007.01919}
}
Training neural network models with discrete (categorical or structured) latent variables can be computationally challenging, due to the need for marginalization over large or combinatorial sets. To circumvent this issue, one typically resorts to sampling-based approximations of the true marginal, requiring noisy gradient estimators (e.g., score function estimator) or continuous relaxations with lower-variance reparameterized gradients (e.g., Gumbel-Softmax). In this paper, we propose a new… Expand

#### Figures, Tables, and Topics from this paper

Learning from Executions for Semantic Parsing
• Computer Science
• NAACL
• 2021
A set of new training objectives that are derived by approaching the problem of learning from executions from the posterior regularization perspective outperform conventional methods on Overnight and GeoQuery, bridging the gap between semi-supervised and supervised learning. Expand
A template for the arxiv style
Neural networks and other machine learning models compute continuous representations, while humans communicate mostly through discrete symbols. Reconciling these two forms of communication isExpand
Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions
• Computer Science
• ArXiv
• 2021
Experiments suggest that I-MLE is competitive with and often outperforms existing approaches which rely on problem-specific relaxations, and that it simplifies to maximum likelihood estimation when used in some recently studied learning settings that involve combinatorial solvers. Expand
On Finding the K-best Non-projective Dependency Trees
• Computer Science
• ACL/IJCNLP
• 2021
This paper provides a simplification of the K-best spanning tree algorithm of Camerini et al. (1980) that allows for a constant time speed-up over the original algorithm and presents a novel extension of the algorithm for decoding theK-best dependency trees of a graph which are subject to a root constraint. Expand
Reconciling the Discrete-Continuous Divide: Towards a Mathematical Theory of Sparse Communication
A new entropy function that includes the discrete and differential entropies as particular cases, and has an interpretation in terms of code optimality, as well as two other information-theoretic counterparts that generalize the mutual information and Kullback-Leibler divergences are introduced. Expand
Storchastic: A Framework for General Stochastic Automatic Differentiation
• Mathematics, Computer Science
• ArXiv
• 2021
Storchastic is a new framework for automatic differentiation of stochastic computation graphs that allows the modeler to choose from a wide variety of gradient estimation methods at each sampling step, to optimally reduce the variance of the gradient estimates. Expand
Evidential Sparsification of Multimodal Latent Spaces in Conditional Variational Autoencoders
• Computer Science
• NeurIPS
• 2020
Experiments on diverse tasks, such as image generation and human behavior prediction, demonstrate the effectiveness of the proposed technique at reducing the discrete latent sample space size of a model while maintaining its learned multimodality. Expand

#### References

SHOWING 1-10 OF 77 REFERENCES
Auto-Encoding Variational Bayes
• Mathematics, Computer Science
• ICLR
• 2014
A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced. Expand
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models
• Computer Science, Mathematics
• NIPS
• 2017
This work introduces a modification to the continuous relaxation of discrete variables and shows that the tightness of the relaxation can be adapted online, removing it as a hyperparameter, leading to faster convergence to a better final log-likelihood. Expand
Categorical Reparameterization with Gumbel-Softmax
• Mathematics, Computer Science
• ICLR
• 2017
It is shown that the Gumbel-Softmax estimator outperforms state-of-the-art gradient estimators on structured output prediction and unsupervised generative modeling tasks with categorical latent variables, and enables large speedups on semi-supervised classification. Expand
Neural Variational Inference and Learning in Belief Networks
• Computer Science, Mathematics
• ICML
• 2014
This work proposes a fast non-iterative approximate inference method that uses a feedforward network to implement efficient exact sampling from the variational posterior and shows that it outperforms the wake-sleep algorithm on MNIST and achieves state-of-the-art results on the Reuters RCV1 document dataset. Expand
A Tutorial on Deep Latent Variable Models of Natural Language
• Computer Science, Mathematics
• ArXiv
• 2018
This tutorial explores issues in depth through the lens of variational inference about how to parameterize conditional likelihoods in latent variable models with powerful function approximators. Expand
SparseMAP: Differentiable Sparse Structured Inference
• Computer Science, Mathematics
• ICML
• 2018
This work introduces SparseMAP, a new method for sparse structured inference, and its natural loss function, which reveals competitive accuracy, improved interpretability, and the ability to capture natural language ambiguities, which is attractive for pipeline systems. Expand
Interpretable Neural Predictions with Differentiable Binary Variables
• Computer Science
• ACL
• 2019
This work proposes a latent model that mixes discrete and continuous behaviour allowing at the same time for binary selections and gradient-based training without REINFORCE, and can tractably compute the expected value of penalties such as L0, which allows it to directly optimise the model towards a pre-specified text selection rate. Expand
Towards Dynamic Computation Graphs via Sparse Latent Structure
• Computer Science, Mathematics
• EMNLP
• 2018
This work proposes a novel approach for end-to-end learning of latent structure predictors jointly with a downstream predictor, and is the first to enable unrestricted dynamic computation graph construction from the global latent structure, while maintaining differentiability. Expand
Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation
• Computer Science, Mathematics
• ArXiv
• 2013
This work considers a small-scale version of {\em conditional computation}, where sparse stochastic units form a distributed representation of gaters that can turn off in combinatorially many ways large chunks of the computation performed in the rest of the neural network. Expand
A Regularized Framework for Sparse and Structured Neural Attention
• Computer Science, Mathematics
• NIPS
• 2017
This paper proposes a new framework for sparse and structured attention, building upon a smoothed max operator, and shows that the gradient of this operator defines a mapping from real values to probabilities, suitable as an attention mechanism. Expand