• Corpus ID: 235294150

Connections and Equivalences between the Nyström Method and Sparse Variational Gaussian Processes

  title={Connections and Equivalences between the Nystr{\"o}m Method and Sparse Variational Gaussian Processes},
  author={Veit Wild and Motonobu Kanagawa and D. Sejdinovic},
We investigate the connections between sparse approximation methods for making kernel methods and Gaussian processes (GPs) scalable to massive data, focusing on the Nyström method and the Sparse Variational Gaussian Processes (SVGP). While sparse approximation methods for GPs and kernel methods share some algebraic similarities, the literature lacks a deep understanding of how and why they are related. This is a possible obstacle for the communications between the GP and kernel communities… 
Improved Convergence Rates for Sparse Approximation Methods in Kernel-Based Learning
Novel confidence intervals are provided for the Nyström method and the sparse variational Gaussian process approximation method, which are established using novel interpretations of the approximate (surrogate) posterior variance of the models.
Posterior and Computational Uncertainty in Gaussian Processes
A new class of methods is developed that provides consistent estimation of the combined uncertainty arising from both the finite number of data observed and the finite amount of computation expended, and the consequences of ignoring computational uncertainty are demonstrated.
Variational Gaussian Processes: A Functional Analysis View
This work proposes to view the GP as lying in a Banach space which then facilitates a unified perspective and is used to understand the relationship between existing features and to draw a connection between kernel ridge regression and variational GP approximations.
Ada-BKB: Scalable Gaussian Process Optimization on Continuous Domain by Adaptive Discretization
Ada-BKB (Adaptive Budgeted Kernelized Bandit), a no-regret Gaussian process optimization algorithm for functions on continuous domains, that provably runs in O, where d eff is the e-ective dimension of the explored space, and which is typically much smaller than T .


Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences
This paper is an attempt to bridge the conceptual gaps between researchers working on the two widely used approaches based on positive definite kernels: Bayesian learning or inference using Gaussian
On Sparse Variational Methods and the Kullback-Leibler Divergence between Stochastic Processes
A substantial generalization of the literature on variational framework for learning inducing variables is given and a new proof of the result for infinite index sets is given which allows inducing points that are not data points and likelihoods that depend on all function values.
Sparse within Sparse Gaussian Processes using Neighbor Information
This work introduces a novel hierarchical prior, which imposes sparsity on the set of inducing variables and enables the possibility to use sparse GPs using a large number of inducing points without incurring a prohibitive computational cost.
A Tutorial on Sparse Gaussian Processes and Variational Inference
This tutorial is to provide access to the basic matter for readers without prior knowledge in both GPs and VI, where pseudo-training examples are treated as optimization arguments of the approximate posterior that are jointly identified together with hyperparameters of the generative model.
Spectral methods in machine learning and new strategies for very large datasets
Two new algorithms for the approximation of positive-semidefinite kernels based on the Nyström method are presented, each of which demonstrates the improved performance of the approach relative to existing methods.
Sparse Gaussian Processes Revisited: Bayesian Approaches to Inducing-Variable Approximations
This work shows that, by revisiting old model approximations such as the fully-independent training conditionals endowed with powerful sampling-based inference methods, treating both inducing locations and GP hyper-parameters in a Bayesian way can improve performance significantly.
Fast Statistical Leverage Score Approximation in Kernel Ridge Regression
A linear time (modulo polylog terms) algorithm is proposed to accurately approximate the statistical leverage scores in the stationary-kernel-based KRR with theoretical guarantees and is orders of magnitude more efficient than existing methods in selecting the representative sub-samples in the Nyström approximation.
Convergence of Sparse Variational Inference in Gaussian Processes Regression
It is shown that the KL-divergence between the approximate model and the exact posterior arbitrarily small for a Gaussian-noise regression model with M needs to grow with N to ensure high quality approximations.
Large-scale SVD and manifold learning
The authors' comparisons show that the Nystrom approximation is superior to the Column sampling method for this task, and approximate Isomap tends to perform better than Laplacian Eigenmaps on both clustering and classification with the labeled CMU-PIE data set.
Revisiting the Nystrom Method for Improved Large-scale Machine Learning
An empirical evaluation of the performance quality and running time of sampling and projection methods on a diverse suite of SPSD matrices and a suite of worst-case theoretical bounds for both random sampling and random projection methods are complemented.