• Corpus ID: 220934491

# Convergence of Sparse Variational Inference in Gaussian Processes Regression

@article{Burt2020ConvergenceOS,
title={Convergence of Sparse Variational Inference in Gaussian Processes Regression},
author={David R. Burt and Carl Edward Rasmussen and Mark van der Wilk},
journal={J. Mach. Learn. Res.},
year={2020},
volume={21},
pages={131:1-131:63}
}
• Published 1 August 2020
• Computer Science
• J. Mach. Learn. Res.
Gaussian processes are distributions over functions that are versatile and mathematically convenient priors in Bayesian modelling. However, their use is often impeded for data with large numbers of observations, $N$, due to the cubic (in $N$) cost of matrix operations used in exact inference. Many solutions have been proposed that rely on $M \ll N$ inducing variables to form an approximation at a cost of $\mathcal{O}(NM^2)$. While the computational cost appears linear in $N$, the true…

## Figures and Tables from this paper

Improved Convergence Rates for Sparse Approximation Methods in Kernel-Based Learning
• Computer Science
ArXiv
• 2022
This work provides novel confidence intervals for the Nyström method and the sparse variational Gaussian processes approximation method to lead to improved error bounds in both regression and optimization.
Tighter Bounds on the Log Marginal Likelihood of Gaussian Process Regression Using Conjugate Gradients
• Computer Science
ICML
• 2021
It is shown that approximate maximum likelihood learning of model parameters by maximising the lower bound retains many benefits of the sparse variational approach while reducing the bias introduced into hyperparameter learning.
Ultra-fast Deep Mixtures of Gaussian Process Experts
• Computer Science
ArXiv
• 2020
This article proposes to design the gating network for selecting the experts from such mixtures of sparse GPs using a deep neural network (DNN) which provides a flexible, robust, and efficient model which is able to significantly outperform competing models.
How Good are Low-Rank Approximations in Gaussian Process Regression?
• Computer Science
ArXiv
• 2021
This work bound the Kullback–Leibler divergence between an exact GP and one resulting from one of the afore-described low-rank approximations to its kernel, as well as between their corresponding predictive densities, and also bound the error between predictive mean vectors and between predictive covariance matrices.
Connections and Equivalences between the Nyström Method and Sparse Variational Gaussian Processes
• Computer Science
ArXiv
• 2021
This work studies the two popular approaches, the Nyström and SVGP approximations, in the context of a regression problem, and establishes various connections and equivalences between them, providing an RKHS interpretation of the SVGP approximation and revealing the origin of the algebraic equivalence between the two approaches.
Contraction rates for sparse variational approximations in Gaussian process regression
• Computer Science
• 2021
The theoretical properties of a variational Bayes method in the Gaussian Process regression model are studied and it is shown that for three particular covariance kernels the VB approach can achieve optimal, minimax contraction rates for a suﬃciently large number of appropriately chosen inducing variables.
Sample and Computationally Efficient Stochastic Kriging in High Dimensions
• Computer Science
• 2020
This work develops a novel methodology that dramatically alleviates the curse of dimensionality, and demonstrates via extensive numerical experiments that the methodology can handle problems with a design space of more than 10,000 dimensions, improving both prediction accuracy and computational efficiency by orders of magnitude relative to typical alternative methods in practice.
Barely Biased Learning for Gaussian Process Regression
• Computer Science
ArXiv
• 2021
This work suggests a method that adaptively selects the amount of computation to use when estimating the log marginal likelihood so that the bias of the objective function is guaranteed to be small.
Gaussian Processes on Hypergraphs
• Computer Science
ArXiv
• 2021
The utility of this framework on three challenging real-world problems that concern multi-class classification for the political party affiliation of legislators on the basis of voting behaviour, probabilistic matrix factorisation of movie reviews, and embedding a hypergraph of animals into a low-dimensional latent space are demonstrated.
A universal probabilistic spike count model reveals ongoing modulation of neural variability
• Biology, Computer Science
bioRxiv
• 2021
A universal probabilistic spike count model is presented that defies a simple parametric relationship with mean spike count as assumed in standard models, its modulation by external covariates can be comparably strong to that of the mean firing rate, and slow low-dimensional latent factors explain away neural correlations.

## References

SHOWING 1-10 OF 65 REFERENCES
Rates of Convergence for Sparse Variational Gaussian Process Regression
• Computer Science
ICML
• 2019
The results show that as datasets grow, Gaussian process posteriors can truly be approximated cheaply, and provide a concrete rule for how to increase $M$ in continual learning scenarios.
Scalable Gaussian process inference using variational methods
Various theoretical issues arising from the application of variational inference to the infinite dimensional Gaussian process setting are settled decisively and a new argument for existing approaches to variational regression that settles debate about their applicability is given.
Scalable Gaussian Process Inference with Finite-data Mean and Variance Guarantees
• Computer Science
AISTATS
• 2019
This work develops an approach to scalable approximate GP regression with finite-data guarantees on the accuracy of pointwise posterior mean and variance estimates, and introduces a novel objective for approximate inference in the nonparametric setting: the preconditioned Fisher (pF) divergence.
Variational Fourier Features for Gaussian Processes
• Computer Science
J. Mach. Learn. Res.
• 2017
This work hinges on a key result that there exist spectral features related to a finite domain of the Gaussian process which exhibit almost-independent covariances, and derives these expressions for Matern kernels in one dimension, and generalize to more dimensions using kernels with specific structures.
Hilbert space methods for reduced-rank Gaussian process regression
• Computer Science, Mathematics
Stat. Comput.
• 2020
The method is compared to previously proposed methods theoretically and through empirical tests with simulated and real data, and shows that the approximation becomes exact when the size of the compact subset and the number of eigenfunctions go to infinity.
Sparse Gaussian Processes using Pseudo-inputs
• Computer Science
NIPS
• 2005
It is shown that this new Gaussian process (GP) regression model can match full GP performance with small M, i.e. very sparse solutions, and it significantly outperforms other approaches in this regime.
Practical Posterior Error Bounds from Variational Objectives
• Computer Science
ArXiv
• 2019
This paper provides rigorous bounds on the error of posterior mean and uncertainty estimates that arise from full-distribution approximations, as in variational inference.
Two problems with variational expectation maximisation for time-series models