# Rates of Convergence for Sparse Variational Gaussian Process Regression

@inproceedings{Burt2019RatesOC, title={Rates of Convergence for Sparse Variational Gaussian Process Regression}, author={David R. Burt and Carl Edward Rasmussen and Mark van der Wilk}, booktitle={ICML}, year={2019} }

Excellent variational approximations to Gaussian process posteriors have been developed which avoid the $\mathcal{O}\left(N^3\right)$ scaling with dataset size $N$. They reduce the computational cost to $\mathcal{O}\left(NM^2\right)$, with $M\ll N$ being the number of inducing variables, which summarise the process. While the computational cost seems to be linear in $N$, the true complexity of the algorithm depends on how $M$ must increase to ensure a certain quality of approximation. We…

## 89 Citations

Convergence of Sparse Variational Inference in Gaussian Processes Regression

- Computer ScienceJ. Mach. Learn. Res.
- 2020

It is shown that the KL-divergence between the approximate model and the exact posterior arbitrarily small for a Gaussian-noise regression model with M needs to grow with N to ensure high quality approximations.

Variational Orthogonal Features

- Computer ScienceArXiv
- 2020

A construction of features for any stationary prior kernel that allow for computation of an unbiased estimator to the ELBO using Monte Carlo samples in $\mathcal{O}(M^3)$ and in $\tilde{N}T+MT)$ with an additional approximation is presented and the impact of this additional approximation on inference quality is analyzed.

Exact sampling of determinantal point processes with sublinear time preprocessing

- Mathematics, Computer ScienceNeurIPS
- 2019

A new algorithm is proposed which samples exactly from a determinantal point process while satisfying the following two properties: its preprocessing cost is $n \cdot \text{poly}(k)$ time preprocessing and its sampling cost is independent of the size of $\mathbf{L}$.

Generalized Local Aggregation for Large Scale Gaussian Process Regression

- Computer Science2020 International Joint Conference on Neural Networks (IJCNN)
- 2020

This work generalizes the traditional mutual-information-based methods (GPoE, RBCM, GRBCM) based on Tsallis mutual information and proposes three heuristic algorithms to solve the model of Gaussian process regression.

Variational Gaussian Process Models without Matrix Inverses

- Computer ScienceAABI
- 2019

A variational lower bound is provided that can be computed without expensive matrix operations like inversion, and can be used as a drop-in replacement to the existing variational method of Hensman et al. (2013, 2015), and can therefore directly be applied in a wide variety of models, such as deep GPs.

Improved Convergence Rates for Sparse Approximation Methods in Kernel-Based Learning

- Computer ScienceArXiv
- 2022

This work provides novel confidence intervals for the Nyström method and the sparse variational Gaussian processes approximation method to lead to improved error bounds in both regression and optimization.

On Negative Transfer and Structure of Latent Functions in Multi-output Gaussian Processes

- Computer ScienceArXiv
- 2020

This article first defines negative transfer in the context of an MGP, then derives necessary conditions for an $\mathcal{MGP}$ model to avoid negative transfer and proposes two latent structures that scale to arbitrarily large datasets, can avoidnegative transfer and allow any kernel or sparse approximations to be used within.

Gaussian Process Inference Using Mini-batch Stochastic Gradient Descent: Convergence Guarantees and Empirical Benefits

- Computer ScienceArXiv
- 2021

Numerical studies on both simulated and real datasets demonstrate that minibatch SGD has better generalization over state-of-the-art GP methods while reducing the computational burden and opening a new, previously unexplored, data size regime for GPs.

Consistent Online Gaussian Process Regression Without the Sample Complexity Bottleneck

- Computer Science2019 American Control Conference (ACC)
- 2019

This work develops the first compression sub-routine for online Gaussian processes that preserves their convergence to the population posterior, i.e., asymptotic posterior consistency, while ameliorating their intractable complexity growth with the sample size.

Direct loss minimization for sparse Gaussian processes

- Computer ScienceAISTATS
- 2021

The DLM algorithm for sGP is developed and it is shown that with appropriate hyperparameter optimization it provides a significant improvement over the variational approach, and optimizing sGP for log loss improves the mean square error in regression.

## References

SHOWING 1-10 OF 43 REFERENCES

Exact sampling of determinantal point processes with sublinear time preprocessing

- Mathematics, Computer ScienceNeurIPS
- 2019

A new algorithm is proposed which samples exactly from a determinantal point process while satisfying the following two properties: its preprocessing cost is $n \cdot \text{poly}(k)$ time preprocessing and its sampling cost is independent of the size of $\mathbf{L}$.

Variational Fourier Features for Gaussian Processes

- Computer ScienceJ. Mach. Learn. Res.
- 2017

This work hinges on a key result that there exist spectral features related to a finite domain of the Gaussian process which exhibit almost-independent covariances, and derives these expressions for Matern kernels in one dimension, and generalize to more dimensions using kernels with specific structures.

Fast Randomized Kernel Ridge Regression with Statistical Guarantees

- Computer ScienceNIPS
- 2015

A version of this approach that comes with running time guarantees as well as improved guarantees on its statistical performance is described, and a fast algorithm is presented to quickly compute coarse approximations to these scores in time linear in the number of samples.

On Sparse Variational Methods and the Kullback-Leibler Divergence between Stochastic Processes

- Computer ScienceAISTATS
- 2016

A substantial generalization of the literature on variational framework for learning inducing variables is given and a new proof of the result for infinite index sets is given which allows inducing points that are not data points and likelihoods that depend on all function values.

Scalable Gaussian Process Inference with Finite-data Mean and Variance Guarantees

- Computer ScienceAISTATS
- 2019

This work develops an approach to scalable approximate GP regression with finite-data guarantees on the accuracy of pointwise posterior mean and variance estimates, and introduces a novel objective for approximate inference in the nonparametric setting: the preconditioned Fisher (pF) divergence.

Scalable Gaussian process inference using variational methods

- Computer Science
- 2017

Various theoretical issues arising from the application of variational inference to the infinite dimensional Gaussian process setting are settled decisively and a new argument for existing approaches to variational regression that settles debate about their applicability is given.

Modified log-Sobolev inequalities for strong-Rayleigh measures

- Mathematics
- 2019

We establish universal modified log-Sobolev inequalities for reversible Markov chains on the boolean lattice $\{0,1\}^n$, under the only assumption that the invariant law $\pi$ satisfies a form of…

Spectral methods in machine learning and new strategies for very large datasets

- Computer ScienceProceedings of the National Academy of Sciences
- 2009

Two new algorithms for the approximation of positive-semidefinite kernels based on the Nyström method are presented, each of which demonstrates the improved performance of the approach relative to existing methods.

Two problems with variational expectation maximisation for time-series models

- Business
- 2011

Variational methods are a key component of the approximate inference and learning toolbox. These methods fill an important middle ground, retaining distributional information about uncertainty in…

Information Consistency of Nonparametric Gaussian Process Methods

- Computer ScienceIEEE Transactions on Information Theory
- 2008

By focussing on the concept of information consistency for Bayesian Gaussian process (GP)models, consistency results and convergence rates are obtained via a regret bound on cumulative log loss.