VarGrad: A Low-Variance Gradient Estimator for Variational Inference

@article{Richter2020VarGradAL,
  title={VarGrad: A Low-Variance Gradient Estimator for Variational Inference},
  author={Lorenz Richter and Ayman Boustati and Nikolas N{\"u}sken and Francisco J. R. Ruiz and {\"O}mer Deniz Akyildiz},
  journal={ArXiv},
  year={2020},
  volume={abs/2010.10436}
}
We analyse the properties of an unbiased gradient estimator of the ELBO for variational inference, based on the score function method with leave-one-out control variates. We show that this gradient estimator can be obtained using a new loss, defined as the variance of the log-ratio between the exact posterior and the variational approximation, which we call the $\textit{log-variance loss}$. Under certain conditions, the gradient of the log-variance loss equals the gradient of the (negative… 
Double Control Variates for Gradient Estimation in Discrete Latent Variable Models
TLDR
This work develops a double control variate for the REINFORCE leave-one-out estimator using Taylor expansions and shows that this estimator can have lower variance compared to other state-of-the-art estimators.
CARMS: Categorical-Antithetic-REINFORCE Multi-Sample Gradient Estimator
TLDR
This work proposes CARMS, an unbiased estimator for categorical random variables based on multiple mutually negatively correlated (jointly antithetic) samples that finds it to outperform competing methods including a strong self-control baseline.
Gradient Estimation with Discrete Stein Operators
TLDR
In benchmark generative modeling tasks such as training binary variational autoencoders, the gradient estimator achieves substantially lower variance than state-of-the-art estimators with the same number of function evaluations.
Bayesian Learning via Neural Schrödinger-Föllmer Flows
TLDR
A new framework for approximate Bayesian inference in large datasets based on stochastic control is explored and the existing theoretical guarantees of this framework are discussed and adapted.
Coupled Gradient Estimators for Discrete Latent Variables
TLDR
Gradient estimators based on reparameterizing categorical variables as sequences of binary variables and Rao-Blackwellization are introduced and it is shown that these proposed categorical gradient estimators provide state-of-the-art performance.
Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces
TLDR
The Gumbel-Max trick is extended to define distributions over structured domains and a family of recursive algorithms with a common feature the authors call stochastic invariant is highlighted, which allows us to construct reliable gradient estimates and control variates without additional constraints on the model.
Nonasymptotic bounds for suboptimal importance sampling
TLDR
This article provides nonasymptotic lower and upper bounds on the relative error in importance sampling that depend on the deviation of the actual proposal from optimality, and identifies potential robustness issues that importance sampling may have, especially in high dimensions.
Combating the Instability of Mutual Information-based Losses via Regularization
TLDR
This work identifies the symptoms behind MI-based losses' instability and mitigates both issues by adding a novel regularization term to the existing losses, and theoretically and experimentally demonstrates that added regularization stabilizes training.

References

SHOWING 1-10 OF 53 REFERENCES
Adam: A Method for Stochastic Optimization
TLDR
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Human-level concept learning through probabilistic program induction
TLDR
A computational model is described that learns in a similar fashion and does so better than current deep learning algorithms and can generate new letters of the alphabet that look “right” as judged by Turing-like tests of the model's output in comparison to what real humans produce.
Backpropagation through the Void: Optimizing control variates for black-box gradient estimation
TLDR
This work introduces a general framework for learning low-variance, unbiased gradient estimators for black-box functions of random variables, and gives an unbiased, action-conditional extension of the advantage actor-critic reinforcement learning algorithm.
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models
TLDR
This work introduces a modification to the continuous relaxation of discrete variables and shows that the tightness of the relaxation can be adapted online, removing it as a hyperparameter, leading to faster convergence to a better final log-likelihood.
Simple statistical gradient-following algorithms for connectionist reinforcement learning
TLDR
This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units that are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reInforcement tasks, and they do this without explicitly computing gradient estimates.
ARM: Augment-REINFORCE-Merge Gradient for Stochastic Binary Networks
TLDR
Experimental results show the augmented-REINFORCE-merge estimator provides state-of-the-art performance in auto-encoding variational inference and maximum likelihood estimation, for discrete latent variable models with one or multiple stochastic binary layers.
Buy 4 REINFORCE Samples, Get a Baseline for Free!
Buy 4 REINFORCE samples
  • ICLR Workshop on Deep Reinforcement Learning Meets Structured Prediction
  • 2019
Operator Variational Inference
TLDR
A black box algorithm, operator variational inference (OPVI), for optimizing any operator objective, which can characterize different properties of variational objectives, such as objectives that admit data subsampling---allowing inference to scale to massive data---as well as objective that admit variational programs---a rich class of posterior approximations that does not require a tractable density.
Black Box Variational Inference
TLDR
This paper presents a "black box" variational inference algorithm, one that can be quickly applied to many models with little additional derivation, based on a stochastic optimization of the variational objective where the noisy gradient is computed from Monte Carlo samples from the Variational distribution.
...
1
2
3
4
5
...