Corpus ID: 220363570

Efficient Marginalization of Discrete and Structured Latent Variables via Sparsity

  title={Efficient Marginalization of Discrete and Structured Latent Variables via Sparsity},
  author={Gonçalo M. Correia and Vlad Niculae and W. Aziz and Andr{\'e} F. T. Martins},
Training neural network models with discrete (categorical or structured) latent variables can be computationally challenging, due to the need for marginalization over large or combinatorial sets. To circumvent this issue, one typically resorts to sampling-based approximations of the true marginal, requiring noisy gradient estimators (e.g., score function estimator) or continuous relaxations with lower-variance reparameterized gradients (e.g., Gumbel-Softmax). In this paper, we propose a new… Expand
Learning from Executions for Semantic Parsing
A set of new training objectives that are derived by approaching the problem of learning from executions from the posterior regularization perspective outperform conventional methods on Overnight and GeoQuery, bridging the gap between semi-supervised and supervised learning. Expand
A template for the arxiv style
Neural networks and other machine learning models compute continuous representations, while humans communicate mostly through discrete symbols. Reconciling these two forms of communication isExpand
Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions
Experiments suggest that I-MLE is competitive with and often outperforms existing approaches which rely on problem-specific relaxations, and that it simplifies to maximum likelihood estimation when used in some recently studied learning settings that involve combinatorial solvers. Expand
On Finding the K-best Non-projective Dependency Trees
This paper provides a simplification of the K-best spanning tree algorithm of Camerini et al. (1980) that allows for a constant time speed-up over the original algorithm and presents a novel extension of the algorithm for decoding theK-best dependency trees of a graph which are subject to a root constraint. Expand
Reconciling the Discrete-Continuous Divide: Towards a Mathematical Theory of Sparse Communication
A new entropy function that includes the discrete and differential entropies as particular cases, and has an interpretation in terms of code optimality, as well as two other information-theoretic counterparts that generalize the mutual information and Kullback-Leibler divergences are introduced. Expand
Storchastic: A Framework for General Stochastic Automatic Differentiation
Storchastic is a new framework for automatic differentiation of stochastic computation graphs that allows the modeler to choose from a wide variety of gradient estimation methods at each sampling step, to optimally reduce the variance of the gradient estimates. Expand
Evidential Sparsification of Multimodal Latent Spaces in Conditional Variational Autoencoders
Experiments on diverse tasks, such as image generation and human behavior prediction, demonstrate the effectiveness of the proposed technique at reducing the discrete latent sample space size of a model while maintaining its learned multimodality. Expand


Auto-Encoding Variational Bayes
A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced. Expand
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models
This work introduces a modification to the continuous relaxation of discrete variables and shows that the tightness of the relaxation can be adapted online, removing it as a hyperparameter, leading to faster convergence to a better final log-likelihood. Expand
Categorical Reparameterization with Gumbel-Softmax
It is shown that the Gumbel-Softmax estimator outperforms state-of-the-art gradient estimators on structured output prediction and unsupervised generative modeling tasks with categorical latent variables, and enables large speedups on semi-supervised classification. Expand
Neural Variational Inference and Learning in Belief Networks
This work proposes a fast non-iterative approximate inference method that uses a feedforward network to implement efficient exact sampling from the variational posterior and shows that it outperforms the wake-sleep algorithm on MNIST and achieves state-of-the-art results on the Reuters RCV1 document dataset. Expand
A Tutorial on Deep Latent Variable Models of Natural Language
This tutorial explores issues in depth through the lens of variational inference about how to parameterize conditional likelihoods in latent variable models with powerful function approximators. Expand
SparseMAP: Differentiable Sparse Structured Inference
This work introduces SparseMAP, a new method for sparse structured inference, and its natural loss function, which reveals competitive accuracy, improved interpretability, and the ability to capture natural language ambiguities, which is attractive for pipeline systems. Expand
Interpretable Neural Predictions with Differentiable Binary Variables
This work proposes a latent model that mixes discrete and continuous behaviour allowing at the same time for binary selections and gradient-based training without REINFORCE, and can tractably compute the expected value of penalties such as L0, which allows it to directly optimise the model towards a pre-specified text selection rate. Expand
Towards Dynamic Computation Graphs via Sparse Latent Structure
This work proposes a novel approach for end-to-end learning of latent structure predictors jointly with a downstream predictor, and is the first to enable unrestricted dynamic computation graph construction from the global latent structure, while maintaining differentiability. Expand
Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation
This work considers a small-scale version of {\em conditional computation}, where sparse stochastic units form a distributed representation of gaters that can turn off in combinatorially many ways large chunks of the computation performed in the rest of the neural network. Expand
A Regularized Framework for Sparse and Structured Neural Attention
This paper proposes a new framework for sparse and structured attention, building upon a smoothed max operator, and shows that the gradient of this operator defines a mapping from real values to probabilities, suitable as an attention mechanism. Expand