• Corpus ID: 220968895

Optimal Variance Control of the Score Function Gradient Estimator for Importance Weighted Bounds

  title={Optimal Variance Control of the Score Function Gradient Estimator for Importance Weighted Bounds},
  author={Valentin Li'evin and Andrea Dittadi and Anders Christensen and Ole Winther},
This paper introduces novel results for the score function gradient estimator of the importance weighted variational bound (IWAE). We prove that in the limit of large $K$ (number of importance samples) one can choose the control variate such that the Signal-to-Noise ratio (SNR) of the estimator grows as $\sqrt{K}$. This is in contrast to the standard pathwise gradient estimator where the SNR decreases as $1/\sqrt{K}$. Based on our theoretical findings we develop a novel control variate that… 

Figures from this paper

Uphill Roads to Variational Tightness: Monotonicity and Monte Carlo Objectives

We revisit the theory of importance weighted variational inference (IWVI), a promising strategy for learning latent variable models. IWVI uses new variational bounds, known as Monte Carlo objectives

Planning From Pixels in Atari With Learned Symbolic Representations

This work leverage variational autoencoders (VAEs) to learn features directly from pixels in a principled manner, and without supervision, and the resulting combination outperforms the original RolloutIW and human professional play on Atari 2600 and drastically reduces the size of the feature set.



Variational Inference for Monte Carlo Objectives

The first unbiased gradient estimator designed for importance-sampled objectives is developed, which is both simpler and more effective than the NVIL estimator proposed for the single-sample variational objective, and is competitive with the currently used biases.

Doubly Reparameterized Gradient Estimators for Monte Carlo Objectives

A computationally efficient, unbiased drop-in gradient estimator that reduces the variance of the IWAE gradient, the reweighted wake-sleep update (RWS), and the jackknife variational inference (JVI) gradient (Nowozin, 2018).

Tighter Variational Bounds are Not Necessarily Better

We provide theoretical and empirical evidence that using tighter evidence lower bounds (ELBOs) can be detrimental to the process of learning an inference network by reducing the signal-to-noise ratio

REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models

This work introduces a modification to the continuous relaxation of discrete variables and shows that the tightness of the relaxation can be adapted online, removing it as a hyperparameter, leading to faster convergence to a better final log-likelihood.

Adam: A Method for Stochastic Optimization

This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

Monte Carlo Gradient Estimation in Machine Learning

A broad and accessible survey of the methods for Monte Carlo gradient estimation in machine learning and across the statistical sciences, exploring three strategies--the pathwise, score function, and measure-valued gradient estimators--exploring their historical developments, derivation, and underlying assumptions.

Sticking the Landing: Simple, Lower-Variance Gradient Estimators for Variational Inference

We propose a simple and general variant of the standard reparameterized gradient estimator for the variational evidence lower bound. Specifically, we remove a part of the total derivative with

Auto-Encoding Variational Bayes

A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.

Revisiting Reweighted Wake-Sleep

The reweighted wake-sleep (RWS) algorithm is revisited, and it is shown that it circumvents both these issues, outperforming current state-of-the-art methods in learning discrete latent-variable models.

On the quantitative analysis of deep belief networks

It is shown that Annealed Importance Sampling (AIS) can be used to efficiently estimate the partition function of an RBM, and a novel AIS scheme for comparing RBM's with different architectures is presented.