Gradient-based explanations for Gaussian Process regression and classification models

  title={Gradient-based explanations for Gaussian Process regression and classification models},
  author={Sarem Seitz},
BSTRACT Gaussian Processes (GPs) have proven themselves as a reliable and effective method in probabilistic Machine Learning. Thanks to recent and current advances, modeling complex data with GPs is becoming more and more feasible. Thus, these types of models are, nowadays, an interesting alternative to Neural and Deep Learning methods, which are arguably the current state-of-the-art in Machine Learning. For the latter, we see an increasing interest in so-called explainable approaches in… 

Figures from this paper



Sparse Gaussian Processes using Pseudo-inputs

It is shown that this new Gaussian process (GP) regression model can match full GP performance with small M, i.e. very sparse solutions, and it significantly outperforms other approaches in this regime.

High-Dimensional Gaussian Process Inference with Derivatives

It is shown that in the low-data regime N < D, the Gram matrix can be decomposed in a manner that reduces the cost of inference to O(ND+(N) and, in special cases, toO(ND+N).

Parametric Gaussian Process Regressors

In an extensive empirical comparison with a number of alternative methods for scalable GP regression, it is found that the resulting predictive distributions exhibit significantly better calibrated uncertainties and higher log likelihoods--often by as much as half a nat per datapoint.

A Differentiable Programming System to Bridge Machine Learning and Scientific Computing

Zygote is described, a Differentiable Programming system that is able to take gradients of general program structures and supports almost all language constructs and compiles high-performance code without requiring any user intervention or refactoring to stage computations.

Evaluating the squared-exponential covariance function in Gaussian processes with integral observations

A new approach is proposed in which the double integral is reduced to a single integral using the error function and this single integral is then computed with efficiently implemented numerical techniques.

Fashionable Modelling with Flux

A framework named Flux is presented that shows how further refinement of the core ideas of machine learning, built upon the foundation of the Julia programming language, can yield an environment that is simple, easily modifiable, and performant.

Scaling Gaussian Process Regression with Derivatives

This work proposes iterative solvers using fast $\mathcal{O}(nd)$ matrix-vector multiplications (MVMs), together with pivoted Cholesky preconditioning that cuts the iterations to convergence by several orders of magnitude, allowing for fast kernel learning and prediction.

Convolutional Gaussian Processes

It is shown how the marginal likelihood can be used to find an optimal weighting between convolutional and RBF kernels to further improve performance, and it is hoped that this illustration of the usefulness of a marginal likelihood will help automate discovering architectures in larger models.

Methods for interpreting and understanding deep neural networks

Axiomatic Attribution for Deep Networks

We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms— Sensitivity and