# Variational Inference: A Review for Statisticians

@article{Blei2016VariationalIA, title={Variational Inference: A Review for Statisticians}, author={David M. Blei and Alp Kucukelbir and Jon D. McAuliffe}, journal={Journal of the American Statistical Association}, year={2016}, volume={112}, pages={859 - 877} }

ABSTRACT One of the core problems of modern statistics is to approximate difficult-to-compute probability densities. This problem is especially important in Bayesian statistics, which frames all inference about unknown quantities as a calculation involving the posterior density. In this article, we review variational inference (VI), a method from machine learning that approximates probability densities through optimization. VI has been used in many applications and tends to be faster than…

## 2,838 Citations

### Elements of Sequential Monte Carlo

- Computer ScienceFound. Trends Mach. Learn.
- 2019

This tutorial reviews sequential Monte Carlo, a random-sampling-based class of methods for approximate inference, and discusses the SMC estimate of the normalizing constant, how this can be used for pseudo-marginal inference and inference evaluation.

### Statistical Inference in Mean-Field Variational Bayes

- Computer Science, Mathematics
- 2019

It is shown that the mean-field approximation to the posterior can be well-approximated relative to the Kullback-Leibler divergence discrepancy measure by a normal distribution whose center is the maximum likelihood estimator (MLE).

### Advances in Variational Inference

- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2019

An overview of recent trends in variational inference is given and a summary of promising future research directions is provided.

### An Introduction to Variational Inference

- Computer ScienceArXiv
- 2021

This paper introduces the concept of Variational Inference (VI), a popular method in machine learning that uses optimization techniques to estimate complex probability densities and discusses the applications of VI to variational auto-encoders and VAE-Generative Adversarial Network.

### On the computational asymptotics of Gaussian variational inference

- Computer Science, Mathematics
- 2020

This work provides a theoretical analysis of the asymptotic convexity properties of variational inference in the popular setting with a Gaussian family; and an algorithm that exploits these properties to find the optimal approximation in the ascyptotic regime, CSVI.

### Frequentist Consistency of Variational Bayes

- Mathematics, Computer ScienceJournal of the American Statistical Association
- 2018

It is proved that the VB posterior converges to the Kullback–Leibler (KL) minimizer of a normal distribution, centered at the truth and the corresponding variational expectation of the parameter is consistent and asymptotically normal.

### Variational approximations using Fisher divergence

- Computer ScienceArXiv
- 2019

This work proposes the construction of variational approximations based on minimizing the Fisher divergence, and develops an efficient computational algorithm that can be applied to a wide range of models without conjugacy or potentially unrealistic mean-field assumptions.

### On variational approximations for frequentist and bayesian inference

- Computer Science, Mathematics
- 2018

This thesis employs a Gaussian variational approximation strategy to handle frequentist generalized linear mixed models with general design random effects matrices such as those including spline basisfunctions and derive algorithms for models containing higher level randomeffects and non-normal responses, which are streamlined in support of computational efficiency.

### Boosting Variational Inference: an Optimization Perspective

- Computer ScienceAISTATS
- 2018

This work studies the convergence properties of boosting variational inference from a modern optimization viewpoint by establishing connections to the classic Frank-Wolfe algorithm and yields novel theoretical insights regarding the sufficient conditions for convergence, explicit rates, and algorithmic simplifications.

### Boosting Black Box Variational Inference

- Computer ScienceNeurIPS
- 2018

This work shows that boosting VI satisfies a relaxed smoothness assumption which is sufficient for the convergence of the functional Frank-Wolfe (FW) algorithm, and proposes to maximize the Residual ELBO (RELBO) which replaces the standard ELBO optimization in VI.

## References

SHOWING 1-10 OF 188 REFERENCES

### An Introduction to Bayesian Inference via Variational Approximations

- Computer SciencePolitical Analysis
- 2011

This paper demonstrates how variational approximations can be used to facilitate the application of Bayesian models to political science data, including models to describe legislative voting blocs and statistical models for political texts.

### Black Box Variational Inference

- Computer ScienceAISTATS
- 2014

This paper presents a "black box" variational inference algorithm, one that can be quickly applied to many models with little additional derivation, based on a stochastic optimization of the variational objective where the noisy gradient is computed from Monte Carlo samples from the Variational distribution.

### Variational Inference for Large-Scale Models of Discrete Choice

- Computer Science
- 2010

Extensive simulations, along with an analysis of real-world data, demonstrate that variational methods achieve accuracy competitive with Markov chain Monte Carlo at a small fraction of the computational cost.

### Variational Bayesian Inference with Stochastic Search

- Computer ScienceICML
- 2012

This work presents an alternative algorithm based on stochastic optimization that allows for direct optimization of the variational lower bound and demonstrates the approach on two non-conjugate models: logistic regression and an approximation to the HDP.

### Ensemble learning in Bayesian neural networks

- Computer Science
- 1998

This chapter shows how the ensemble learning approach can be extended to full-covariance Gaussian distributions while remaining computationally tractable, and extends the framework to deal with hyperparameters, leading to a simple re-estimation procedure.

### Laplace Variational Approximation for Semiparametric Regression in the Presence of Heteroscedastic Errors

- Computer Science, MathematicsJournal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America
- 2016

A mean field variational approximation with an embedded Laplace approximation to account for the nonconjugate structure is derived, which properly accounting for the smooth heteroscedasticity leads to significant improvements in posterior inference for key physical characteristics of an organic molecule.

### Variational inference for Dirichlet process mixtures

- Computer Science
- 2006

A variational inference algorithm forDP mixtures is presented and experiments that compare the algorithm to Gibbs sampling algorithms for DP mixtures of Gaussians and present an application to a large-scale image analysis problem are presented.

### On Variational Bayes Estimation and Variational Information Criteria for Linear Regression Models

- Computer Science, Mathematics
- 2014

It is proved that under mild regularity conditions, VB based estimators enjoy some desirable frequentist properties such as consistency and can be used to obtain asymptotically valid standard errors.

### Natural Conjugate Gradient in Variational Inference

- Computer ScienceICONIP
- 2007

This work proposes using the geometry of the variational approximating distribution instead to speed up a conjugate gradient method for variational learning and inference, and shows significant speedups over alternative learning algorithms.

### Latent-Space Variational Bayes

- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2008

This paper introduces a more general approximate inference framework for conjugate-exponential family models, which it is shown that the LSVB approach gives better estimates of the model evidence as well as the distribution over latent variables than the VBEM approach, but in practice, the distributionover latent variables has to be approximated.