• Corpus ID: 231918685

Bias-Free Scalable Gaussian Processes via Randomized Truncations

@article{Potapczynski2021BiasFreeSG,
  title={Bias-Free Scalable Gaussian Processes via Randomized Truncations},
  author={Andres Potapczynski and Luhuan Wu and Dan Biderman and Geoff Pleiss and John P. Cunningham},
  journal={ArXiv},
  year={2021},
  volume={abs/2102.06695}
}
Scalable Gaussian Process methods are computationally attractive, yet introduce modeling biases that require rigorous study. This paper analyzes two common techniques: early truncated conjugate gradients (CG) and random Fourier features (RFF). We find that both methods introduce a systematic bias on the learned hyperparameters: CG tends to underfit while RFF tends to overfit. We address these issues using randomized truncation estimators that eliminate bias in exchange for increased variance… 

Figures from this paper

Reducing the Variance of Gaussian Process Hyperparameter Optimization with Preconditioning
TLDR
It is proved that preconditioning has an additional benefit that has been previously unexplored that not only reduces the bias of the log-marginal likelihood estimator and its derivatives, but also simultaneously can reduce variance at essentially negligible cost.
Rectangular Flows for Manifold Learning
TLDR
Two methods relying on tricks from automatic differentiation and numerical linear algebra to either evaluate or approximate the full likelihood objective are proposed, performing end-to-end manifold learning and density estimation.
SKIing on Simplices: Kernel Interpolation on the Permutohedral Lattice for Scalable Gaussian Processes
TLDR
This work develops a connection between SKI and the permutohedral lattice used for highdimensional fast bilateral filtering, and provides a CUDA implementation of Simplex-GP, which enables significant GPU acceleration of MVM based inference.
When are Iterative Gaussian Processes Reliably Accurate?
TLDR
This work investigates CG tolerance, preconditioner rank, and Lanczos decompositions, and shows that L-BFGS-B is a compelling optimizer for Iterative GPs, achieving convergence with fewer gradient updates.
Barely Biased Learning for Gaussian Process Regression
TLDR
This work suggests a method that adaptively selects the amount of computation to use when estimating the log marginal likelihood so that the bias of the objective function is guaranteed to be small.
Neural Implicit Manifold Learning for Topology-Aware Generative Modelling
TLDR
Constrained energy-based models are introduced, which use a constrained variant of Langevin dynamics to train and sample within a learned manifold and can learn manifold-supported distributions with complex topologies more accurately than pushforward models.
Posterior and Computational Uncertainty in Gaussian Processes
TLDR
A new class of methods is developed that provides consistent estimation of the combined uncertainty arising from both the finite number of data observed and the finite amount of computation expended, and the consequences of ignoring computational uncertainty are demonstrated.
Matrix Inversion free variational inference in Conditional Student’s T Processes
TLDR
Van der Wilk et al. (2020) propose a lower bound that can be computed without computationally expensive matrix operations such as inversion and log determinants, thereby proposing a computationally efficient approximate posterior over covariance matrices within the probabilistic framework of Student’s T Processes (STP) (Shah et al., 2014).
Variational Nearest Neighbor Gaussian Processes
TLDR
This work proposes variational nearest neighbor Gaussian process (VNNGP), which introduces a prior that only retains correlations within K nearest-neighboring observations, thereby inducing sparse precision structure and enabling stochastic optimization with a time complexity of O(K).
Scaling Structured Inference with Randomization
TLDR
A family of randomized dynamic programming (RDP) algorithms for scaling structured models to tens of thousands of latent states and preventing posterior collapse when using RDP to train a scaled structured VAE.
...
...

References

SHOWING 1-10 OF 42 REFERENCES
Enabling scalable stochastic gradient-based inference for Gaussian processes by employing the Unbiased LInear System SolvEr (ULISSE)
TLDR
An adaptation of the Stochastic Gradient Langevin Dynamics algorithm is proposed to draw samples from the posterior distribution over covariance parameters with negligible bias and without the need to compute the marginal likelihood.
Constant-Time Predictive Distributions for Gaussian Processes
TLDR
This paper addresses shortcomings in GP marginal likelihood and posterior mean computations by using the Lanczos algorithm to rapidly approximate the predictive covariance matrix and substantially improves time and space complexity.
Stochastic Gradient Descent in Correlated Settings: A Study on Gaussian Processes
TLDR
Numerical studies on both simulated and real datasets demonstrate that minibatch SGD has better generalization over state-of-the-art GP methods while reducing the computational burden and opening a new, previously unexplored, data size regime for GPs.
Sparse Gaussian Processes using Pseudo-inputs
TLDR
It is shown that this new Gaussian process (GP) regression model can match full GP performance with small M, i.e. very sparse solutions, and it significantly outperforms other approaches in this regime.
On the Error of Random Fourier Features
TLDR
The uniform error bound of that paper on random Fourier features is improved, as well as giving novel understandings of the embedding's variance, approximation error, and use in some machine learning methods.
Randomized Automatic Differentiation
TLDR
This work develops a general framework and approach for randomized automatic differentiation (RAD), which allows unbiased gradient estimates to be computed with reduced memory in return for variance, and examines limitations of the general approach and argues that it must leverage problem specific structure to realize benefits.
Efficiently sampling functions from Gaussian process posteriors
TLDR
This work identifies a decomposition of Gaussian processes that naturally lends itself to scalable sampling by separating out the prior from the data, and proposes an easy-to-use and general-purpose approach for fast posterior sampling, which seamlessly pairs with sparse approximations to afford scalability both during training and at test time.
Distributed Gaussian Processes
TLDR
The robust Bayesian Committee Machine is introduced, a practical and scalable product-of-experts model for large-scale distributed GP regression and can be used on heterogeneous computing infrastructures, ranging from laptops to clusters.
Efficient High Dimensional Bayesian Optimization with Additivity and Quadrature Fourier Features
TLDR
An efficient and provably no-regret Bayesian optimization algorithm for optimization of black-box functions in high dimensions and introduces a novel deterministic Fourier Features approximation based on numerical integration with detailed analysis for the squared exponential kernel.
Scalable Log Determinants for Gaussian Process Kernel Learning
TLDR
It is found that Lanczos is generally superior to Chebyshev for kernel learning, and that a surrogate approach can be highly efficient and accurate with popular kernels.
...
...