# Fast Sparse Gaussian Process Methods: The Informative Vector Machine

@inproceedings{Lawrence2002FastSG, title={Fast Sparse Gaussian Process Methods: The Informative Vector Machine}, author={Neil D. Lawrence and Matthias W. Seeger and Ralf Herbrich}, booktitle={NIPS}, year={2002} }

We present a framework for sparse Gaussian process (GP) methods which uses forward selection with criteria based on information-theoretic principles, previously suggested for active learning. Our goal is not only to learn d-sparse predictors (which can be evaluated in O(d) rather than O(n), d ≪ n, n the number of training points), but also to perform training under strong restrictions on time and memory requirements. The scaling of our method is at most O(n · d2), and in large real-world…

## 556 Citations

Sparse Gaussian Processes using Pseudo-inputs

- Computer ScienceNIPS
- 2005

It is shown that this new Gaussian process (GP) regression model can match full GP performance with small M, i.e. very sparse solutions, and it significantly outperforms other approaches in this regime.

Sparse gaussian processes for large-scale machine learning

- Computer Science
- 2011

This thesis presents several novel sparse GP models that compare favorably with SPGP, both in terms of predictive performance and error bar quality, and provides two broad classes of models: Marginalized Networks (MNs) and Inter- Domain GPs (IDGPs).

Fast Forward Selection to Speed Up Sparse Gaussian Process Regression

- Computer ScienceAISTATS
- 2003

A method for the sparse greedy approximation of Bayesian Gaussian process regression, featuring a novel heuristic for very fast forward selection, which leads to a sufficiently stable approximation of the log marginal likelihood of the training data, which can be optimised to adjust a large number of hyperparameters automatically.

Validation-Based Sparse Gaussian Process Classifier Design

- Computer ScienceNeural Computation
- 2009

The proposed method uses a negative log predictive (NLP) loss measure, which is easy to compute for GP models, and uses this measure for both basis vector selection and hyperparameter adaptation.

Sparse Gaussian Process Classification With Multiple Classes

- Computer Science
- 2004

This work shows how to generalize the binary classification informative vector machine (IVM) to multiple classes and is a principled approximation to Bayesian inference which yields valid uncertainty estimates and allows for hyperparameter adaption via marginal likelihood maximization.

Efficient Nonparametric Bayesian Modelling with Sparse Gaussian Process Approximations

- Computer Science
- 2006

A general framework based on the informative vector machine (IVM) is presented and it is shown how the complete Bayesian task of inference and learning of free hyperparameters can be performed in a practically efficient manner.

Fast large scale Gaussian process regression using the improved fast Gauss transform

- Computer Science
- 2006

An 2-exact approximation technique, the improved fast Gauss transform, and the theory of inexact Krylov subspace methods are used to reduce the computational complexity to O(N), for the squared exponential covariance function.

Flexible and efficient Gaussian process models for machine learning

- Computer Science
- 2007

Several new techniques to reduce the complexity of Gaussian process models to 0(N3) complexity and relax the Gaussianity assumption of the process by learning a nonlinear transformation of the output space are developed.

Sparse nonlinear methods for predicting structured data

- Computer Science
- 2012

The goals of this work are to develop nonlinear, nonparametric modelling techniques for structure learning and prediction problems in which there are structured dependencies among the observed data, and to equip the authors' models with sparse representations which serve both to handle prior sparse connectivity assumptions and to reduce computational complexity.

Fast large scale Gaussian process regression using approximate matrix-vector products

- Computer Science
- 2006

This work considers the use of 2-exact matrix-vector product algorithms to reduce the computational complexity of Gaussian processes to O(N), and shows how to choose 2 to guarantee the convergence of the iterative methods.

## References

SHOWING 1-10 OF 14 REFERENCES

Sparse On-Line Gaussian Processes

- Computer ScienceNeural Computation
- 2002

An approach for sparse representations of gaussian process (GP) models (which are Bayesian types of kernel machines) in order to overcome their limitations for large data sets is developed based on a combination of a Bayesian on-line algorithm and a sequential construction of a relevant subsample of data that fully specifies the prediction of the GP model.

Sparse Greedy Gaussian Process Regression

- Computer ScienceNIPS
- 2000

A simple sparse greedy technique to approximate the maximum a posteriori estimate of Gaussian Processes with much improved scaling behaviour in the sample size m, and shows applications to large scale problems.

Gaussian Processes for Classification: Mean-Field Algorithms

- Computer ScienceNeural Computation
- 2000

A mean-field algorithm for binary classification with gaussian processes that is based on the TAP approach originally proposed in statistical physics of disordered systems is derived and an approximate leave-one-out estimator for the generalization error is computed.

A Sparse B ayesian Compression Scheme — The Informative Vector Machine

- Computer Science
- 2001

The framework presented here makes use of the Bayesian method to determine how much information is gained from each data-point, and aims at extracting the maximu m amount of information from the minimum number of data-points.

Bayesian methods for adaptive models

- Computer Science
- 1992

The Bayesian framework for model comparison and regularisation is demonstrated by studying interpolation and classification problems modelled with both linear and non-linear models, and it is shown that the careful incorporation of error bar information into a classifier's predictions yields improved performance.

A Bayesian Committee Machine

- Computer ScienceNeural Computation
- 2000

It is found that the performance of the BCM improves if several test points are queried at the same time and is optimal if the number of test points is at least as large as the degrees of freedom of the estimator.

Fast training of support vector machines using sequential minimal optimization, advances in kernel methods

- Computer Science
- 1999

SMO breaks this large quadratic programming problem into a series of smallest possible QP problems, which avoids using a time-consuming numerical QP optimization as an inner loop and hence SMO is fastest for linear SVMs and sparse data sets.

A family of algorithms for approximate Bayesian inference

- Computer Science
- 2001

This thesis presents an approximation technique that can perform Bayesian inference faster and more accurately than previously possible, and is found to be convincingly better than rival approximation techniques: Monte Carlo, Laplace's method, and variational Bayes.

Using the Nyström Method to Speed Up Kernel Machines

- Computer ScienceNIPS
- 2000

It is shown that an approximation to the eigendecomposition of the Gram matrix can be computed by the Nystrom method (which is used for the numerical solution of eigenproblems) and the computational complexity of a predictor using this approximation is O(m2n).