• Corpus ID: 236154956

Differentiable Annealed Importance Sampling and the Perils of Gradient Noise

@article{Zhang2021DifferentiableAI,
  title={Differentiable Annealed Importance Sampling and the Perils of Gradient Noise},
  author={Guodong Zhang and Kyle Hsu and Jianing Li and Chelsea Finn and Roger B. Grosse},
  journal={ArXiv},
  year={2021},
  volume={abs/2107.10211}
}
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation, but are not fully differentiable due to the use of Metropolis-Hastings correction steps. Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective using gradient-based methods. To this end, we propose Differentiable AIS (DAIS), a variant of AIS which ensures differentiability by abandoning the… 

Figures and Tables from this paper

Surrogate Likelihoods for Variational Annealed Importance Sampling
TLDR
This work argues theoretically that the resulting algorithm permits the user to make an intuitive trade-off between inference fidelity and computational cost, and shows that the method performs well in practice and is well-suited for black-box inference in probabilistic programming frameworks.
MCMC Variational Inference via Uncorrected Hamiltonian Annealing
TLDR
It is shown empirically that the proposed framework to use an AIS-like procedure with Uncorrected Hamiltonian MCMC yields better performances than other competing approaches, and that the ability to tune its parameters using reparameterization gradients may lead to large performance improvements.
Frequency-Domain Representation of First-Order Methods: A Simple and Robust Framework of Analysis
Motivated by recent applications in min-max optimization, we employ tools from nonlinear control theory in order to analyze a class of “historical” gradient-based methods, for which the next step

References

SHOWING 1-10 OF 57 REFERENCES
Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms
TLDR
This work proposes a new method that lets us leverage reparameterization gradients even when variables are outputs of a acceptance-rejection sampling algorithm, and shows that the variance of the estimator of the gradient is significantly lower than other state-of-the-art methods, leading to faster convergence of stochastic gradient variational inference.
Annealed importance sampling
TLDR
It is shown how one can use the Markov chain transitions for such an annealing sequence to define an importance sampler, which can be seen as a generalization of a recently-proposed variant of sequential importance sampling.
Theoretical guarantees for approximate sampling from smooth and log‐concave densities
Sampling from various kinds of distributions is an issue of paramount importance in statistics since it is often the key ingredient for constructing estimators, test procedures or confidence
MCMC Variational Inference via Uncorrected Hamiltonian Annealing
TLDR
It is shown empirically that the proposed framework to use an AIS-like procedure with Uncorrected Hamiltonian MCMC yields better performances than other competing approaches, and that the ability to tune its parameters using reparameterization gradients may lead to large performance improvements.
Stochastic Gradient Descent as Approximate Bayesian Inference
TLDR
It is demonstrated that constant SGD gives rise to a new variational EM algorithm that optimizes hyperparameters in complex probabilistic models and a scalable approximate MCMC algorithm, the Averaged Stochastic Gradient Sampler is proposed.
Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics
TLDR
This article proves that, under verifiable assumptions, the SGLD algorithm is consistent, satisfies a central limit theorem (CLT) and its asymptotic bias-variance decomposition can be characterized by an explicit functional of the step-sizes sequence (δm)m≥0.
A Complete Recipe for Stochastic Gradient MCMC
TLDR
This paper provides a general recipe for constructing MCMC samplers--including stochastic gradient versions--based on continuous Markov processes specified via two matrices, and uses the recipe to straightforwardly propose a new state-adaptive sampler: stochastics gradient Riemann Hamiltonian Monte Carlo (SGRHMC).
Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis
TLDR
The present work provides a nonasymptotic analysis in the context of non-convex learning problems, giving finite-time guarantees for SGLD to find approximate minimizers of both empirical and population risks.
Stochastic Gradient Hamiltonian Monte Carlo
TLDR
A variant that uses second-order Langevin dynamics with a friction term that counteracts the effects of the noisy gradient, maintaining the desired target distribution as the invariant distribution is introduced.
Hamiltonian Variational Auto-Encoder
TLDR
It is shown here how to optimally select reverse kernels in this setting and, by building upon Hamiltonian Importance Sampling (HIS), a scheme that provides low-variance unbiased estimators of the ELBO and its gradients using the reparameterization trick is obtained.
...
1
2
3
4
5
...