# Differentiable Annealed Importance Sampling and the Perils of Gradient Noise

@article{Zhang2021DifferentiableAI, title={Differentiable Annealed Importance Sampling and the Perils of Gradient Noise}, author={Guodong Zhang and Kyle Hsu and Jianing Li and Chelsea Finn and Roger B. Grosse}, journal={ArXiv}, year={2021}, volume={abs/2107.10211} }

Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation, but are not fully differentiable due to the use of Metropolis-Hastings correction steps. Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective using gradient-based methods. To this end, we propose Differentiable AIS (DAIS), a variant of AIS which ensures differentiability by abandoning the…

## 3 Citations

Surrogate Likelihoods for Variational Annealed Importance Sampling

- Computer Science, MathematicsArXiv
- 2021

This work argues theoretically that the resulting algorithm permits the user to make an intuitive trade-off between inference fidelity and computational cost, and shows that the method performs well in practice and is well-suited for black-box inference in probabilistic programming frameworks.

MCMC Variational Inference via Uncorrected Hamiltonian Annealing

- Computer Science, MathematicsArXiv
- 2021

It is shown empirically that the proposed framework to use an AIS-like procedure with Uncorrected Hamiltonian MCMC yields better performances than other competing approaches, and that the ability to tune its parameters using reparameterization gradients may lead to large performance improvements.

Frequency-Domain Representation of First-Order Methods: A Simple and Robust Framework of Analysis

- MathematicsSymposium on Simplicity in Algorithms (SOSA)
- 2022

Motivated by recent applications in min-max optimization, we employ tools from nonlinear control theory in order to analyze a class of “historical” gradient-based methods, for which the next step…

## References

SHOWING 1-10 OF 57 REFERENCES

Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms

- Computer Science, MathematicsAISTATS
- 2017

This work proposes a new method that lets us leverage reparameterization gradients even when variables are outputs of a acceptance-rejection sampling algorithm, and shows that the variance of the estimator of the gradient is significantly lower than other state-of-the-art methods, leading to faster convergence of stochastic gradient variational inference.

Annealed importance sampling

- Mathematics, PhysicsStat. Comput.
- 2001

It is shown how one can use the Markov chain transitions for such an annealing sequence to define an importance sampler, which can be seen as a generalization of a recently-proposed variant of sequential importance sampling.

Theoretical guarantees for approximate sampling from smooth and log‐concave densities

- Mathematics
- 2014

Sampling from various kinds of distributions is an issue of paramount importance in statistics since it is often the key ingredient for constructing estimators, test procedures or confidence…

MCMC Variational Inference via Uncorrected Hamiltonian Annealing

- Computer Science, MathematicsArXiv
- 2021

It is shown empirically that the proposed framework to use an AIS-like procedure with Uncorrected Hamiltonian MCMC yields better performances than other competing approaches, and that the ability to tune its parameters using reparameterization gradients may lead to large performance improvements.

Stochastic Gradient Descent as Approximate Bayesian Inference

- Computer Science, MathematicsJ. Mach. Learn. Res.
- 2017

It is demonstrated that constant SGD gives rise to a new variational EM algorithm that optimizes hyperparameters in complex probabilistic models and a scalable approximate MCMC algorithm, the Averaged Stochastic Gradient Sampler is proposed.

Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics

- Mathematics, Computer ScienceJ. Mach. Learn. Res.
- 2016

This article proves that, under verifiable assumptions, the SGLD algorithm is consistent, satisfies a central limit theorem (CLT) and its asymptotic bias-variance decomposition can be characterized by an explicit functional of the step-sizes sequence (δm)m≥0.

A Complete Recipe for Stochastic Gradient MCMC

- Computer Science, MathematicsNIPS
- 2015

This paper provides a general recipe for constructing MCMC samplers--including stochastic gradient versions--based on continuous Markov processes specified via two matrices, and uses the recipe to straightforwardly propose a new state-adaptive sampler: stochastics gradient Riemann Hamiltonian Monte Carlo (SGRHMC).

Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis

- Mathematics, Computer ScienceCOLT
- 2017

The present work provides a nonasymptotic analysis in the context of non-convex learning problems, giving finite-time guarantees for SGLD to find approximate minimizers of both empirical and population risks.

Stochastic Gradient Hamiltonian Monte Carlo

- Mathematics, Computer ScienceICML
- 2014

A variant that uses second-order Langevin dynamics with a friction term that counteracts the effects of the noisy gradient, maintaining the desired target distribution as the invariant distribution is introduced.

Hamiltonian Variational Auto-Encoder

- Computer Science, MathematicsNeurIPS
- 2018

It is shown here how to optimally select reverse kernels in this setting and, by building upon Hamiltonian Importance Sampling (HIS), a scheme that provides low-variance unbiased estimators of the ELBO and its gradients using the reparameterization trick is obtained.