# Scaling up the Automatic Statistician: Scalable Structure Discovery using Gaussian Processes

@inproceedings{Kim2018ScalingUT, title={Scaling up the Automatic Statistician: Scalable Structure Discovery using Gaussian Processes}, author={Hyunjik Kim and Yee Whye Teh}, booktitle={AISTATS}, year={2018} }

Automating statistical modelling is a challenging problem in artificial intelligence. The Automatic Statistician takes a first step in this direction, by employing a kernel search algorithm with Gaussian Processes (GP) to provide interpretable statistical models for regression problems. However this does not scale due to its $O(N^3)$ running time for the model selection. We propose Scalable Kernel Composition (SKC), a scalable kernel search algorithm that extends the Automatic Statistician to…

## 33 Citations

GPy-ABCD: A Configurable Automatic Bayesian Covariance Discovery Implementation

- Computer Science
- 2021

This paper presents a lighter, more functional and configurable implementation of the ABCD idea, outputting only fit models and short descriptions: the Python package GPy-ABCD, which was developed as part of an adaptive modelling component for the FRANK query-answering system.

Automatic Bayesian Density Analysis

- Computer ScienceAAAI
- 2019

Automatic Bayesian Density Analysis (ABDA) allows for automatic and efficient missing value estimation, statistical data type and likelihood discovery, anomaly detection and dependency structure mining, on top of providing accurate density estimation.

Learning Compositional Sparse Gaussian Processes with a Shrinkage Prior

- Computer ScienceAAAI
- 2021

A novel probabilistic algorithm to learn a kernel composition by handling the sparsity in the kernel selection with Horseshoe prior is presented and it is demonstrated that this model can capture characteristics of time series with significant reductions in computational time and have competitive regression performance on real-world data sets.

Towards Large-scale Gaussian Process Models for Efficient Bayesian Machine Learning

- Computer ScienceDATA
- 2020

A new large-scale GPM structure is developed, which incorporates a divide-&-conquer-based paradigm and thus enables efﬁcient GPM retrieval for large- scale data, and outlines challenges concerning this newly developed G PM structure regarding its algorithmic retrieval, its integration with given data platforms and technologies, as well as cross-model comparability and interpretability.

Automatic Gaussian Process Model Retrieval for Big Data

- Computer ScienceCIKM
- 2020

This work proposes a new approach that allows to efficiently and automatically retrieve GPMs for large-scale data and demonstrates the quality of resulting models, which clearly outperform default GPM instantiations, while maintaining reasonable model training time.

Scalable Variational Bayesian Kernel Selection for Sparse Gaussian Process Regression

- Computer ScienceAAAI
- 2020

Though it is computationally challenging to jointly optimize a large number of hyperparameters due to many kernels being evaluated simultaneously by the VBKS algorithm, it is shown that the variational lower bound of the log-marginal likelihood can be decomposed into an additive form such that each additive term depends only on a disjoint subset of the Variational variables and can thus be optimized independently.

Large-scale Retrieval of Bayesian Machine Learning Models for Time Series Data via Gaussian Processes

- Computer ScienceKDIR
- 2020

The Timeseries Automatic GPM Retrieval (TAGR) algorithm is proposed, which is composed of independent statistical representations for non-overlapping segments of the given data and reduces computation time by orders of magnitude.

3CS Algorithm for Efficient Gaussian Process Model Retrieval

- Computer Science2020 25th International Conference on Pattern Recognition (ICPR)
- 2021

This paper proposes a novel approach for efficient large-scale GPM retrieval: the Concatenated Composite Covariance Search (3CS) algorithm, which makes use of multiple local kernel searches on dynamically partitioned data to overcome the performance limitations of state-of-the-art G PM retrieval algorithms.

Time Series Forecasting with Gaussian Processes Needs Priors

- Computer ScienceECML/PKDD
- 2021

A composition of kernels is proposed, which contains the components needed to model most time series: linear trend, periodic patterns, and other ﬂexible kernel for modeling the non-linear trend and assign priors to the hyperparameters, in order to keep the inference within a plausible range.

Discovering Latent Covariance Structures for Multiple Time Series

- Computer ScienceICML
- 2019

A new GP model is presented which naturally handles multiple time series by placing an Indian Buffet Process (IBP) prior on the presence of shared kernels, and a selective covariance structure decomposition allows exploiting shared parameters over a set of multiple, selected time series.

## References

SHOWING 1-10 OF 42 REFERENCES

Fast Forward Selection to Speed Up Sparse Gaussian Process Regression

- Computer ScienceAISTATS
- 2003

A method for the sparse greedy approximation of Bayesian Gaussian process regression, featuring a novel heuristic for very fast forward selection, which leads to a sufficiently stable approximation of the log marginal likelihood of the training data, which can be optimised to adjust a large number of hyperparameters automatically.

Automatic Construction and Natural-Language Description of Nonparametric Regression Models

- Computer ScienceAAAI
- 2014

The beginnings of an automatic statistician is presented, focusing on regression problems, which explores an open-ended space of statistical models to discover a good explanation of a data set, and then produces a detailed report with figures and natural-language text.

Bayesian optimization for automated model selection

- Computer ScienceNIPS
- 2016

This work presents a sophisticated method for automatically searching for an appropriate kernel from an infinite space of potential choices, based on Bayesian optimization in model space, and constructs a novel kernel between models to explain a given dataset.

Discovering and Exploiting Additive Structure for Bayesian Optimization

- Computer ScienceAISTATS
- 2017

This work investigates how to automatically discover hidden additive structure while simultaneously exploiting it through Bayesian optimization, proposing an efficient algorithm based on Metropolis–Hastings sampling and demonstrating its efficacy empirically on synthetic and real-world data sets.

Gaussian Process Kernels for Pattern Discovery and Extrapolation

- Computer ScienceICML
- 2013

This work introduces simple closed form kernels that can be used with Gaussian processes to discover patterns and enable extrapolation, and shows that it is possible to reconstruct several popular standard covariances within this framework.

Bayesian Nonparametric Kernel-Learning

- Computer ScienceAISTATS
- 2016

Bayesian nonparmetric kernel-learning (BaNK), a generic, data-driven framework for scalable learning of kernels that places a nonparametric prior on the spectral distribution of random frequencies allowing it to both learn kernels and scale to large datasets.

Structure Discovery in Nonparametric Regression through Compositional Kernel Search

- Computer ScienceICML
- 2013

This work defines a space of kernel structures which are built compositionally by adding and multiplying a small number of base kernels, and presents a method for searching over this space of structures which mirrors the scientific discovery process.

Sparse Gaussian Processes using Pseudo-inputs

- Computer ScienceNIPS
- 2005

It is shown that this new Gaussian process (GP) regression model can match full GP performance with small M, i.e. very sparse solutions, and it significantly outperforms other approaches in this regime.

Thoughts on Massively Scalable Gaussian Processes

- Computer ScienceArXiv
- 2015

The MSGP framework enables the use of Gaussian processes on billions of datapoints, without requiring distributed inference, or severe assumptions, and reduces the standard GP learning and inference complexity to O(n), and the standard test point prediction complexity to $O(1).

Less is More: Nyström Computational Regularization

- Computer ScienceNIPS
- 2015

A simple incremental variant of Nystrom Kernel Regularized Least Squares is suggested, where the subsampling level implements a form of computational regularization, in the sense that it controls at the same time regularization and computations.