# Stein Variational Adaptive Importance Sampling

@article{Han2017SteinVA, title={Stein Variational Adaptive Importance Sampling}, author={J. Han and Qiang Liu}, journal={arXiv: Machine Learning}, year={2017} }

We propose a novel adaptive importance sampling algorithm which incorporates Stein variational gradient decent algorithm (SVGD) with importance sampling (IS). Our algorithm leverages the nonparametric transforms in SVGD to iteratively decrease the KL divergence between our importance proposal and the target distribution. The advantages of this algorithm are twofold: first, our algorithm turns SVGD into a standard IS algorithm, allowing us to use standard diagnostic and analytic tools of IS to…

## 22 Citations

### A Stein variational Newton method

- Computer ScienceNeurIPS
- 2018

This paper accelerates and generalizes the SVGD algorithm by including second-order information, thereby approximating a Newton-like iteration in function space and shows how second- order information can lead to more effective choices of kernel.

### A Variational Adaptive Population Importance Sampler

- Computer ScienceICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2019

This work introduces a novel AIS scheme which incorporates modern techniques in stochastic optimization to improve the methodology for higher-dimensional posterior inference and shows that the method outperforms other state-of-the-art approaches in high-dimensional scenarios.

### Improved Stein Variational Gradient Descent with Importance Weights

- Computer ScienceArXiv
- 2022

Under certain assumptions, this work provides a descent lemma for the population limit β -SVGD , which covers the descent lemmas for thepopulation limit SVGD when β → 0 .

### Stein Variational Gradient Descent Without Gradient

- Computer ScienceICML
- 2018

A gradient-free variant of SVGD (GF-SVGD), which replaces the true gradient with a surrogate gradient, and corrects the induced bias by re-weighting the gradients in a proper form, and shed insights on the empirical choice of the surrogate gradient.

### Stein Variational Inference for Discrete Distributions

- Computer ScienceAISTATS
- 2020

The proposed framework that transforms discrete distributions to equivalent piecewise continuous distributions, on which the gradient-free SVGD is applied to perform efficient approximate inference outperforms existing GOF test methods for intractable discrete distributions.

### 2 Stein Learning of Variational Autoencoder ( Stein VAE ) 2

- Computer Science
- 2017

A new method for learning variational autoencoders (VAEs) is developed, based on Stein variational gradient descent, demonstrating the scalability of the model to large datasets.

### VAE Learning via Stein Variational Gradient Descent

- Computer ScienceNIPS
- 2017

A new method for learning variational autoencoders (VAEs) is developed, based on Stein variational gradient descent, demonstrating the scalability of the model to large datasets.

### Stein Variational Gradient Descent as Gradient Flow

- Computer ScienceNIPS
- 2017

This paper develops the first theoretical analysis on SVGD, discussing its weak convergence properties and showing that its asymptotic behavior is captured by a gradient flow of the KL divergence functional under a new metric structure induced by Stein operator.

### Scalable Approximate Inference and Some Applications

- Computer ScienceArXiv
- 2020

This thesis proposes an importance-weighted method to efficiently aggregate local models in distributed learning with one-shot communication, and results on simulated and real datasets indicate the statistical efficiency and wide applicability of the algorithm.

### Advances in Variational Inference

- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2019

An overview of recent trends in variational inference is given and a summary of promising future research directions is provided.

## 30 References

### Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

- Computer ScienceNIPS
- 2016

We propose a general purpose variational inference algorithm that forms a natural counterpart of gradient descent for optimization. Our method iteratively transports a set of particles to match the…

### Adaptive importance sampling in general mixture classes

- BusinessStat. Comput.
- 2008

An adaptive algorithm that iteratively updates both the weights and component parameters of a mixture importance sampling density so as to optimise the performance of importance sampling, as measured by an entropy criterion is proposed.

### Auto-Encoding Variational Bayes

- Computer ScienceICLR
- 2014

A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.

### Stein Variational Gradient Descent as Gradient Flow

- Computer ScienceNIPS
- 2017

This paper develops the first theoretical analysis on SVGD, discussing its weak convergence properties and showing that its asymptotic behavior is captured by a gradient flow of the KL divergence functional under a new metric structure induced by Stein operator.

### Hamiltonian Annealed Importance Sampling for partition function estimation

- Computer ScienceArXiv
- 2012

An extension to annealed importance sampling that uses Hamiltonian dynamics to rapidly estimate normalization constants is introduced that is demonstrated by computing log likelihoods in directed and undirected probabilistic image models.

### Improved Variational Inference with Inverse Autoregressive Flow

- Computer ScienceNIPS 2016
- 2017

A new type of normalizing flow, inverse autoregressive flow (IAF), is proposed that, in contrast to earlier published flows, scales well to high-dimensional latent spaces and significantly improves upon diagonal Gaussian approximate posteriors.

### Variational Inference with Normalizing Flows

- Computer Science, MathematicsICML
- 2015

It is demonstrated that the theoretical advantages of having posteriors that better match the true posterior, combined with the scalability of amortized variational approaches, provides a clear improvement in performance and applicability of variational inference.

### Annealed importance sampling

- MathematicsStat. Comput.
- 2001

It is shown how one can use the Markov chain transitions for such an annealing sequence to define an importance sampler, which can be seen as a generalization of a recently-proposed variant of sequential importance sampling.

### Adaptive Importance Sampling via Stochastic Convex Programming

- Mathematics, Computer Science
- 2014

An adaptive importance sampling algorithm is proposed that simultaneously improves the choice of sampling distribution while accumulating a Monte Carlo estimate and it is proved that the method's unbiased estimator has variance that is asymptotically optimal over the exponential family.

### Measuring Sample Quality with Kernels

- MathematicsICML
- 2017

A theory of weak convergence for K SDs based on Stein's method is developed, it is demonstrated that commonly used KSDs fail to detect non-convergence even for Gaussian targets, and it is shown that kernels with slowly decaying tails provably determine convergence for a large class of target distributions.