# Convergence of Sparse Variational Inference in Gaussian Processes Regression

@article{Burt2020ConvergenceOS, title={Convergence of Sparse Variational Inference in Gaussian Processes Regression}, author={David R. Burt and Carl Edward Rasmussen and Mark van der Wilk}, journal={J. Mach. Learn. Res.}, year={2020}, volume={21}, pages={131:1-131:63} }

Gaussian processes are distributions over functions that are versatile and mathematically convenient priors in Bayesian modelling. However, their use is often impeded for data with large numbers of observations, $N$, due to the cubic (in $N$) cost of matrix operations used in exact inference. Many solutions have been proposed that rely on $M \ll N$ inducing variables to form an approximation at a cost of $\mathcal{O}(NM^2)$. While the computational cost appears linear in $N$, the true…

## Figures and Tables from this paper

## 21 Citations

Improved Convergence Rates for Sparse Approximation Methods in Kernel-Based Learning

- Computer ScienceArXiv
- 2022

This work provides novel confidence intervals for the Nyström method and the sparse variational Gaussian processes approximation method to lead to improved error bounds in both regression and optimization.

Tighter Bounds on the Log Marginal Likelihood of Gaussian Process Regression Using Conjugate Gradients

- Computer ScienceICML
- 2021

It is shown that approximate maximum likelihood learning of model parameters by maximising the lower bound retains many benefits of the sparse variational approach while reducing the bias introduced into hyperparameter learning.

Ultra-fast Deep Mixtures of Gaussian Process Experts

- Computer ScienceArXiv
- 2020

This article proposes to design the gating network for selecting the experts from such mixtures of sparse GPs using a deep neural network (DNN) which provides a flexible, robust, and efficient model which is able to significantly outperform competing models.

How Good are Low-Rank Approximations in Gaussian Process Regression?

- Computer ScienceArXiv
- 2021

This work bound the Kullback–Leibler divergence between an exact GP and one resulting from one of the afore-described low-rank approximations to its kernel, as well as between their corresponding predictive densities, and also bound the error between predictive mean vectors and between predictive covariance matrices.

Connections and Equivalences between the Nyström Method and Sparse Variational Gaussian Processes

- Computer ScienceArXiv
- 2021

This work studies the two popular approaches, the Nyström and SVGP approximations, in the context of a regression problem, and establishes various connections and equivalences between them, providing an RKHS interpretation of the SVGP approximation and revealing the origin of the algebraic equivalence between the two approaches.

Contraction rates for sparse variational approximations in Gaussian process regression

- Computer Science
- 2021

The theoretical properties of a variational Bayes method in the Gaussian Process regression model are studied and it is shown that for three particular covariance kernels the VB approach can achieve optimal, minimax contraction rates for a suﬃciently large number of appropriately chosen inducing variables.

Sample and Computationally Efficient Stochastic Kriging in High Dimensions

- Computer Science
- 2020

This work develops a novel methodology that dramatically alleviates the curse of dimensionality, and demonstrates via extensive numerical experiments that the methodology can handle problems with a design space of more than 10,000 dimensions, improving both prediction accuracy and computational efficiency by orders of magnitude relative to typical alternative methods in practice.

Barely Biased Learning for Gaussian Process Regression

- Computer ScienceArXiv
- 2021

This work suggests a method that adaptively selects the amount of computation to use when estimating the log marginal likelihood so that the bias of the objective function is guaranteed to be small.

Gaussian Processes on Hypergraphs

- Computer ScienceArXiv
- 2021

The utility of this framework on three challenging real-world problems that concern multi-class classification for the political party affiliation of legislators on the basis of voting behaviour, probabilistic matrix factorisation of movie reviews, and embedding a hypergraph of animals into a low-dimensional latent space are demonstrated.

A universal probabilistic spike count model reveals ongoing modulation of neural variability

- Biology, Computer SciencebioRxiv
- 2021

A universal probabilistic spike count model is presented that defies a simple parametric relationship with mean spike count as assumed in standard models, its modulation by external covariates can be comparably strong to that of the mean firing rate, and slow low-dimensional latent factors explain away neural correlations.

## References

SHOWING 1-10 OF 65 REFERENCES

Rates of Convergence for Sparse Variational Gaussian Process Regression

- Computer ScienceICML
- 2019

The results show that as datasets grow, Gaussian process posteriors can truly be approximated cheaply, and provide a concrete rule for how to increase $M$ in continual learning scenarios.

Scalable Gaussian process inference using variational methods

- Computer Science
- 2017

Various theoretical issues arising from the application of variational inference to the infinite dimensional Gaussian process setting are settled decisively and a new argument for existing approaches to variational regression that settles debate about their applicability is given.

Scalable Gaussian Process Inference with Finite-data Mean and Variance Guarantees

- Computer ScienceAISTATS
- 2019

This work develops an approach to scalable approximate GP regression with finite-data guarantees on the accuracy of pointwise posterior mean and variance estimates, and introduces a novel objective for approximate inference in the nonparametric setting: the preconditioned Fisher (pF) divergence.

Variational Fourier Features for Gaussian Processes

- Computer ScienceJ. Mach. Learn. Res.
- 2017

This work hinges on a key result that there exist spectral features related to a finite domain of the Gaussian process which exhibit almost-independent covariances, and derives these expressions for Matern kernels in one dimension, and generalize to more dimensions using kernels with specific structures.

Hilbert space methods for reduced-rank Gaussian process regression

- Computer Science, MathematicsStat. Comput.
- 2020

The method is compared to previously proposed methods theoretically and through empirical tests with simulated and real data, and shows that the approximation becomes exact when the size of the compact subset and the number of eigenfunctions go to infinity.

Sparse Gaussian Processes using Pseudo-inputs

- Computer ScienceNIPS
- 2005

It is shown that this new Gaussian process (GP) regression model can match full GP performance with small M, i.e. very sparse solutions, and it significantly outperforms other approaches in this regime.

Practical Posterior Error Bounds from Variational Objectives

- Computer ScienceArXiv
- 2019

This paper provides rigorous bounds on the error of posterior mean and uncertainty estimates that arise from full-distribution approximations, as in variational inference.

Two problems with variational expectation maximisation for time-series models

- Business
- 2011

Variational methods are a key component of the approximate inference and learning toolbox. These methods fill an important middle ground, retaining distributional information about uncertainty in…

Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret

- Computer ScienceCOLT
- 2019

BKB (budgeted kernelized bandit), a new approximate GP algorithm for optimization under bandit feedback that achieves near-optimal regret (and hence near-Optimal convergence rate) with near-constant per-iteration complexity and remarkably no assumption on the input space or covariance of the GP.

On Sparse Variational Methods and the Kullback-Leibler Divergence between Stochastic Processes

- Computer ScienceAISTATS
- 2016

A substantial generalization of the literature on variational framework for learning inducing variables is given and a new proof of the result for infinite index sets is given which allows inducing points that are not data points and likelihoods that depend on all function values.