• Corpus ID: 214802067

The equivalence between Stein variational gradient descent and black-box variational inference

  title={The equivalence between Stein variational gradient descent and black-box variational inference},
  author={Casey Chu and Kentaro Minami and Kenji Fukumizu},
We formalize an equivalence between two popular methods for Bayesian inference: Stein variational gradient descent (SVGD) and black-box variational inference (BBVI). In particular, we show that BBVI corresponds precisely to SVGD when the kernel is the neural tangent kernel. Furthermore, we interpret SVGD and BBVI as kernel gradient flows; we do this by leveraging the recent perspective that views SVGD as a gradient flow in the space of probability distributions and showing that BBVI naturally… 

Figures from this paper

Stein Variational Gaussian Processes
SVGD provides a non-parametric alternative to variational inference which is substantially faster than MCMC but unhindered by parametric assumptions, and it is proved that for GP models with Lipschitz gradients the SVGD algorithm monotonically decreases the Kullback-Leibler divergence from the sampling distribution to the true posterior.
A Non-Asymptotic Analysis for Stein Variational Gradient Descent
A descent lemma is obtained establishing that the SVGD algorithm decreases the objective at each iteration, and provably converges, with less restrictive assumptions on the step size than required in earlier analyses.
A Neural Tangent Kernel Perspective of GANs
A novel theoretical framework of analysis for Generative Adversarial Networks (GANs) is proposed, leveraging the theory of infinitewidth neural networks for the discriminator via its Neural Tangent Kernel to characterize the trained discriminator for a wide range of losses and establish general differentiability properties of the network.


Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning
This paper shows that generative adversarial networks, variational inference, and actor-critic methods in reinforcement learning can all be seen through the lens of this framework, and discusses a generic optimization algorithm for this formulation, called probability functional descent (PFD).
Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm
We propose a general purpose variational inference algorithm that forms a natural counterpart of gradient descent for optimization. Our method iteratively transports a set of particles to match the
Black Box Variational Inference
This paper presents a "black box" variational inference algorithm, one that can be quickly applied to many models with little additional derivation, based on a stochastic optimization of the variational objective where the noisy gradient is computed from Monte Carlo samples from the Variational distribution.
On the geometry of Stein variational gradient descent
This paper focuses on the recently introduced Stein variational gradient descent methodology, a class of algorithms that rely on iterated steepest descent steps with respect to a reproducing kernel Hilbert space norm, and considers certain nondifferentiable kernels with adjusted tails.
Learning to Draw Samples with Amortized Stein Variational Gradient Descent
A simple algorithm to train stochastic neural networks to draw samples from given target distributions for probabilistic inference based on iteratively adjusting the neural network parameters so that the output changes along a Stein variational gradient direction that maximally decreases the KL divergence with the target distribution.
Stein Variational Gradient Descent as Gradient Flow
This paper develops the first theoretical analysis on SVGD, discussing its weak convergence properties and showing that its asymptotic behavior is captured by a gradient flow of the KL divergence functional under a new metric structure induced by Stein operator.
Auto-Encoding Variational Bayes
A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.
Smoothness and Stability in GANs
This work develops a principled theoretical framework for understanding the stability of various types of GANs and derives conditions that guarantee eventual stationarity of the generator when it is trained with gradient descent, conditions that must be satisfied by the divergence that is minimized by the GAN and the generator's architecture.
Understanding and Accelerating Particle-Based Variational Inference
This work unify various finite-particle approximations that existing ParVIs use, and recognizes that the approximation is essentially a compulsory smoothing treatment, in either of two equivalent forms.
Sticking the Landing: Simple, Lower-Variance Gradient Estimators for Variational Inference
We propose a simple and general variant of the standard reparameterized gradient estimator for the variational evidence lower bound. Specifically, we remove a part of the total derivative with