• Corpus ID: 236171259

Neural Variational Gradient Descent

  title={Neural Variational Gradient Descent},
  author={Lauro Langosco and Vincent Fortuin and Heiko Strathmann},
Particle-based approximate Bayesian inference approaches such as Stein Variational Gradient Descent (SVGD) combine the flexibility and convergence guarantees of sampling methods with the computational benefits of variational inference. In practice, SVGD relies on the choice of an appropriate kernel function, which impacts its ability to model the target distribution—a challenging problem with only heuristic solutions. We propose Neural Variational Gradient Descent (NVGD), which is based on… 

Figures from this paper

Optimal Neural Network Approximation of Wasserstein Gradient Direction via Convex Optimization

By solving the convex SDP, the optimal approximation of the Wasserstein gradient direction is obtained in this class of functions andumerical experiments including PDE-constrained Bayesian inference and parameter estimation in COVID-19 modeling demonstrate the effectiveness of the proposed method.

Optimizing Functionals on the Space of Probabilities with Input Convex Neural Networks

An approach that relies on the recently introduced input-convex neural networks (ICNN) to parametrize the space of convex functions in order to approximate the JKO scheme is proposed, as well as in designing functionals over measures that enjoy convergence guarantees.

Regularized seismic amplitude inversion via variational inference

Over the years, seismic amplitude variation with offset has been successfully applied for predicting the elastic properties of the subsurface. Nevertheless, the solution of the amplitude inversion is



A Stein variational Newton method

This paper accelerates and generalizes the SVGD algorithm by including second-order information, thereby approximating a Newton-like iteration in function space and shows how second- order information can lead to more effective choices of kernel.

Annealed Stein Variational Gradient Descent

This work empirically explores the ability of Stein variational gradient descent to sample from multi-modal distributions and proposes an annealing schedule to solve two important issues: the inability of the particles to escape from local modes and the inefficacy in reproducing the density of the different regions.

Function Space Particle Optimization for Bayesian Neural Networks

This paper demonstrates through extensive experiments that their method successfully overcomes this issue, and outperforms strong baselines in a variety of tasks including prediction, defense against adversarial examples, and reinforcement learning.

On Stein Variational Neural Network Ensembles

It is found that SVGD using functional and hybrid kernels can overcome the limitations of deep ensembles and improves on functional diversity and uncertainty estimation and approaches the true Bayesian posterior more closely.

Kernel Stein Generative Modeling

Noise conditional kernel SVGD (NCK-SVGD), that works in tandem with the recently introduced Noise Conditional Score Network estimator, and offers a flexible control between sample quality and diversity in gradient-based Explicit Generative Modeling.

Operator Variational Inference

A black box algorithm, operator variational inference (OPVI), for optimizing any operator objective, which can characterize different properties of variational objectives, such as objectives that admit data subsampling---allowing inference to scale to massive data---as well as objective that admit variational programs---a rich class of posterior approximations that does not require a tractable density.

Message Passing Stein Variational Gradient Descent

Experimental results show MP-SVGD's advantages of preventing vanishing repulsive force in high-dimensional space over SVGD, and its particle efficiency and approximation flexibility over other inference methods on graphical models.

Stein Variational Gradient Descent as Gradient Flow

This paper develops the first theoretical analysis on SVGD, discussing its weak convergence properties and showing that its asymptotic behavior is captured by a gradient flow of the KL divergence functional under a new metric structure induced by Stein operator.

Improving predictions of Bayesian neural networks via local linearization

In this paper we argue that in Bayesian deep learning, the frequently utilized generalized Gauss-Newton (GGN) approximation should be understood as a modification of the underlying probabilistic

What Are Bayesian Neural Network Posteriors Really Like?

It is shown that BNNs can achieve significant performance gains over standard training and deep ensembles, and a single long HMC chain can provide a comparable representation of the posterior to multiple shorter chains, and posterior tempering is not needed for near-optimal performance.