• Corpus ID: 10831269

Stein Variational Adaptive Importance Sampling

@article{Han2017SteinVA,
  title={Stein Variational Adaptive Importance Sampling},
  author={J. Han and Qiang Liu},
  journal={arXiv: Machine Learning},
  year={2017}
}
  • J. HanQiang Liu
  • Published 18 April 2017
  • Computer Science
  • arXiv: Machine Learning
We propose a novel adaptive importance sampling algorithm which incorporates Stein variational gradient decent algorithm (SVGD) with importance sampling (IS). Our algorithm leverages the nonparametric transforms in SVGD to iteratively decrease the KL divergence between our importance proposal and the target distribution. The advantages of this algorithm are twofold: first, our algorithm turns SVGD into a standard IS algorithm, allowing us to use standard diagnostic and analytic tools of IS to… 

Figures and Tables from this paper

A Stein variational Newton method

This paper accelerates and generalizes the SVGD algorithm by including second-order information, thereby approximating a Newton-like iteration in function space and shows how second- order information can lead to more effective choices of kernel.

A Variational Adaptive Population Importance Sampler

This work introduces a novel AIS scheme which incorporates modern techniques in stochastic optimization to improve the methodology for higher-dimensional posterior inference and shows that the method outperforms other state-of-the-art approaches in high-dimensional scenarios.

Improved Stein Variational Gradient Descent with Importance Weights

Under certain assumptions, this work provides a descent lemma for the population limit β -SVGD , which covers the descent lemmas for thepopulation limit SVGD when β → 0 .

Stein Variational Gradient Descent Without Gradient

A gradient-free variant of SVGD (GF-SVGD), which replaces the true gradient with a surrogate gradient, and corrects the induced bias by re-weighting the gradients in a proper form, and shed insights on the empirical choice of the surrogate gradient.

Stein Variational Inference for Discrete Distributions

The proposed framework that transforms discrete distributions to equivalent piecewise continuous distributions, on which the gradient-free SVGD is applied to perform efficient approximate inference outperforms existing GOF test methods for intractable discrete distributions.

2 Stein Learning of Variational Autoencoder ( Stein VAE ) 2

A new method for learning variational autoencoders (VAEs) is developed, based on Stein variational gradient descent, demonstrating the scalability of the model to large datasets.

VAE Learning via Stein Variational Gradient Descent

A new method for learning variational autoencoders (VAEs) is developed, based on Stein variational gradient descent, demonstrating the scalability of the model to large datasets.

Stein Variational Gradient Descent as Gradient Flow

This paper develops the first theoretical analysis on SVGD, discussing its weak convergence properties and showing that its asymptotic behavior is captured by a gradient flow of the KL divergence functional under a new metric structure induced by Stein operator.

Scalable Approximate Inference and Some Applications

This thesis proposes an importance-weighted method to efficiently aggregate local models in distributed learning with one-shot communication, and results on simulated and real datasets indicate the statistical efficiency and wide applicability of the algorithm.

Advances in Variational Inference

An overview of recent trends in variational inference is given and a summary of promising future research directions is provided.

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

We propose a general purpose variational inference algorithm that forms a natural counterpart of gradient descent for optimization. Our method iteratively transports a set of particles to match the

Adaptive importance sampling in general mixture classes

An adaptive algorithm that iteratively updates both the weights and component parameters of a mixture importance sampling density so as to optimise the performance of importance sampling, as measured by an entropy criterion is proposed.

Auto-Encoding Variational Bayes

A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.

Stein Variational Gradient Descent as Gradient Flow

This paper develops the first theoretical analysis on SVGD, discussing its weak convergence properties and showing that its asymptotic behavior is captured by a gradient flow of the KL divergence functional under a new metric structure induced by Stein operator.

Hamiltonian Annealed Importance Sampling for partition function estimation

An extension to annealed importance sampling that uses Hamiltonian dynamics to rapidly estimate normalization constants is introduced that is demonstrated by computing log likelihoods in directed and undirected probabilistic image models.

Improved Variational Inference with Inverse Autoregressive Flow

A new type of normalizing flow, inverse autoregressive flow (IAF), is proposed that, in contrast to earlier published flows, scales well to high-dimensional latent spaces and significantly improves upon diagonal Gaussian approximate posteriors.

Variational Inference with Normalizing Flows

It is demonstrated that the theoretical advantages of having posteriors that better match the true posterior, combined with the scalability of amortized variational approaches, provides a clear improvement in performance and applicability of variational inference.

Annealed importance sampling

It is shown how one can use the Markov chain transitions for such an annealing sequence to define an importance sampler, which can be seen as a generalization of a recently-proposed variant of sequential importance sampling.

Adaptive Importance Sampling via Stochastic Convex Programming

An adaptive importance sampling algorithm is proposed that simultaneously improves the choice of sampling distribution while accumulating a Monte Carlo estimate and it is proved that the method's unbiased estimator has variance that is asymptotically optimal over the exponential family.

Measuring Sample Quality with Kernels

A theory of weak convergence for K SDs based on Stein's method is developed, it is demonstrated that commonly used KSDs fail to detect non-convergence even for Gaussian targets, and it is shown that kernels with slowly decaying tails provably determine convergence for a large class of target distributions.