• Corpus ID: 247011307

Gradient Estimation with Discrete Stein Operators

  title={Gradient Estimation with Discrete Stein Operators},
  author={Jiaxin Shi and Yuhao Zhou and Jessica Hwang and Michalis K. Titsias and Lester W. Mackey},
Gradient estimation—approximating the gradient of an expectation with respect to the parameters of a distribution—is central to the solution of many machine learning problems. However, when the distribution is discrete, most common gradient estimators suffer from excessive variance. To improve the quality of gradient estimation, we introduce a variance reduction technique based on Stein operators for discrete distributions. We then use this technique to build flexible control variates for the… 
Generalised Bayesian Inference for Discrete Intractable Likelihood
The main idea is to update beliefs about model parameters using a discrete Fisher divergence, in lieu of the problematic intractable likelihood, to create a generalised posterior that can be sampled using standard computational tools, circumventing the intractables normalising constant.


Estimating Gradients for Discrete Random Variables by Sampling without Replacement
This work derives an unbiased estimator for expectations over discrete random variables based on sampling without replacement, which reduces variance as it avoids duplicate samples and is closely related to other gradient estimators.
Local Expectation Gradients for Black Box Variational Inference
This algorithm divides the problem of estimating the stochastic gradients over multiple variational parameters into smaller sub-tasks so that each sub-task explores wisely the most relevant part of the variational distribution.
Double Control Variates for Gradient Estimation in Discrete Latent Variable Models
This work develops a double control variate for the REINFORCE leave-one-out estimator using Taylor expansions and shows that this estimator can have lower variance compared to other state-of-the-art estimators.
Minimum Probability Flow Learning
This work proposes a new parameter estimation technique that does not require computing an intractable normalization factor or sampling from the equilibrium distribution of the model, and demonstrates parameter estimation in Ising models, deep belief networks and an independent component analysis model of natural scenes.
Gradient Estimation Using Stochastic Computation Graphs
This work introduces the formalism of stochastic computation graphs—directed acyclic graphs that include both deterministic functions and conditional probability distributions—and describes how to easily and automatically derive an unbiased estimator of the loss function's gradient.
VarGrad: A Low-Variance Gradient Estimator for Variational Inference
It is empirically demonstrated that VarGrad offers a favourable variance versus computation trade-off compared to other state-of-the-art estimators on a discrete VAE.
Evaluating the Variance of Likelihood-Ratio Gradient Estimators
This study establishes a novel framework of gradient estimation that includes most of the common gradient estimators as special cases and gives a natural derivation of the optimal estimator that can be interpreted as a special case of the likelihood-ratio method so that the optimal degree of practical techniques with it can be evaluated.
Backpropagation through the Void: Optimizing control variates for black-box gradient estimation
This work introduces a general framework for learning low-variance, unbiased gradient estimators for black-box functions of random variables, and gives an unbiased, action-conditional extension of the advantage actor-critic reinforcement learning algorithm.
Monte Carlo Gradient Estimation in Machine Learning
A broad and accessible survey of the methods for Monte Carlo gradient estimation in machine learning and across the statistical sciences, exploring three strategies--the pathwise, score function, and measure-valued gradient estimators--exploring their historical developments, derivation, and underlying assumptions.
Auto-Encoding Variational Bayes
A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.