Rates of Convergence for Sparse Variational Gaussian Process Regression

@inproceedings{Burt2019RatesOC,
  title={Rates of Convergence for Sparse Variational Gaussian Process Regression},
  author={David R. Burt and Carl Edward Rasmussen and Mark van der Wilk},
  booktitle={ICML},
  year={2019}
}
Excellent variational approximations to Gaussian process posteriors have been developed which avoid the $\mathcal{O}\left(N^3\right)$ scaling with dataset size $N$. They reduce the computational cost to $\mathcal{O}\left(NM^2\right)$, with $M\ll N$ being the number of inducing variables, which summarise the process. While the computational cost seems to be linear in $N$, the true complexity of the algorithm depends on how $M$ must increase to ensure a certain quality of approximation. We… 

Figures and Tables from this paper

Convergence of Sparse Variational Inference in Gaussian Processes Regression
TLDR
It is shown that the KL-divergence between the approximate model and the exact posterior arbitrarily small for a Gaussian-noise regression model with M needs to grow with N to ensure high quality approximations.
Variational Orthogonal Features
TLDR
A construction of features for any stationary prior kernel that allow for computation of an unbiased estimator to the ELBO using Monte Carlo samples in $\mathcal{O}(M^3)$ and in $\tilde{N}T+MT)$ with an additional approximation is presented and the impact of this additional approximation on inference quality is analyzed.
Exact sampling of determinantal point processes with sublinear time preprocessing
TLDR
A new algorithm is proposed which samples exactly from a determinantal point process while satisfying the following two properties: its preprocessing cost is $n \cdot \text{poly}(k)$ time preprocessing and its sampling cost is independent of the size of $\mathbf{L}$.
Generalized Local Aggregation for Large Scale Gaussian Process Regression
TLDR
This work generalizes the traditional mutual-information-based methods (GPoE, RBCM, GRBCM) based on Tsallis mutual information and proposes three heuristic algorithms to solve the model of Gaussian process regression.
Variational Gaussian Process Models without Matrix Inverses
TLDR
A variational lower bound is provided that can be computed without expensive matrix operations like inversion, and can be used as a drop-in replacement to the existing variational method of Hensman et al. (2013, 2015), and can therefore directly be applied in a wide variety of models, such as deep GPs.
Improved Convergence Rates for Sparse Approximation Methods in Kernel-Based Learning
TLDR
This work provides novel confidence intervals for the Nyström method and the sparse variational Gaussian processes approximation method to lead to improved error bounds in both regression and optimization.
On Negative Transfer and Structure of Latent Functions in Multi-output Gaussian Processes
TLDR
This article first defines negative transfer in the context of an MGP, then derives necessary conditions for an $\mathcal{MGP}$ model to avoid negative transfer and proposes two latent structures that scale to arbitrarily large datasets, can avoidnegative transfer and allow any kernel or sparse approximations to be used within.
Gaussian Process Inference Using Mini-batch Stochastic Gradient Descent: Convergence Guarantees and Empirical Benefits
TLDR
Numerical studies on both simulated and real datasets demonstrate that minibatch SGD has better generalization over state-of-the-art GP methods while reducing the computational burden and opening a new, previously unexplored, data size regime for GPs.
Consistent Online Gaussian Process Regression Without the Sample Complexity Bottleneck
  • Alec Koppel
  • Computer Science
    2019 American Control Conference (ACC)
  • 2019
TLDR
This work develops the first compression sub-routine for online Gaussian processes that preserves their convergence to the population posterior, i.e., asymptotic posterior consistency, while ameliorating their intractable complexity growth with the sample size.
Direct loss minimization for sparse Gaussian processes
TLDR
The DLM algorithm for sGP is developed and it is shown that with appropriate hyperparameter optimization it provides a significant improvement over the variational approach, and optimizing sGP for log loss improves the mean square error in regression.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 43 REFERENCES
Exact sampling of determinantal point processes with sublinear time preprocessing
TLDR
A new algorithm is proposed which samples exactly from a determinantal point process while satisfying the following two properties: its preprocessing cost is $n \cdot \text{poly}(k)$ time preprocessing and its sampling cost is independent of the size of $\mathbf{L}$.
Variational Fourier Features for Gaussian Processes
TLDR
This work hinges on a key result that there exist spectral features related to a finite domain of the Gaussian process which exhibit almost-independent covariances, and derives these expressions for Matern kernels in one dimension, and generalize to more dimensions using kernels with specific structures.
Fast Randomized Kernel Ridge Regression with Statistical Guarantees
TLDR
A version of this approach that comes with running time guarantees as well as improved guarantees on its statistical performance is described, and a fast algorithm is presented to quickly compute coarse approximations to these scores in time linear in the number of samples.
On Sparse Variational Methods and the Kullback-Leibler Divergence between Stochastic Processes
TLDR
A substantial generalization of the literature on variational framework for learning inducing variables is given and a new proof of the result for infinite index sets is given which allows inducing points that are not data points and likelihoods that depend on all function values.
Scalable Gaussian Process Inference with Finite-data Mean and Variance Guarantees
TLDR
This work develops an approach to scalable approximate GP regression with finite-data guarantees on the accuracy of pointwise posterior mean and variance estimates, and introduces a novel objective for approximate inference in the nonparametric setting: the preconditioned Fisher (pF) divergence.
Scalable Gaussian process inference using variational methods
TLDR
Various theoretical issues arising from the application of variational inference to the infinite dimensional Gaussian process setting are settled decisively and a new argument for existing approaches to variational regression that settles debate about their applicability is given.
Modified log-Sobolev inequalities for strong-Rayleigh measures
We establish universal modified log-Sobolev inequalities for reversible Markov chains on the boolean lattice $\{0,1\}^n$, under the only assumption that the invariant law $\pi$ satisfies a form of
Spectral methods in machine learning and new strategies for very large datasets
TLDR
Two new algorithms for the approximation of positive-semidefinite kernels based on the Nyström method are presented, each of which demonstrates the improved performance of the approach relative to existing methods.
Two problems with variational expectation maximisation for time-series models
Variational methods are a key component of the approximate inference and learning toolbox. These methods fill an important middle ground, retaining distributional information about uncertainty in
Information Consistency of Nonparametric Gaussian Process Methods
TLDR
By focussing on the concept of information consistency for Bayesian Gaussian process (GP)models, consistency results and convergence rates are obtained via a regret bound on cumulative log loss.
...
1
2
3
4
5
...