# Automatic Construction and Natural-Language Description of Nonparametric Regression Models

@inproceedings{Lloyd2014AutomaticCA, title={Automatic Construction and Natural-Language Description of Nonparametric Regression Models}, author={James Robert Lloyd and David Kristjanson Duvenaud and Roger B. Grosse and Joshua B. Tenenbaum and Zoubin Ghahramani}, booktitle={AAAI}, year={2014} }

This paper presents the beginnings of an automatic statistician, focusing on regression problems. Our system explores an open-ended space of statistical models to discover a good explanation of a data set, and then produces a detailed report with figures and natural language text.
Our approach treats unknown regression functions nonparametrically using Gaussian processes, which has two important consequences. First, Gaussian processes can model functions in terms of high-level properties (e…

## Figures and Tables from this paper

## 203 Citations

Automatic Generation of Probabilistic Programming from Time Series Data

- Computer Science
- 2016

A new perspective is provided to build expressive probabilistic program from continue time series data when the structure of model is not given and it is reported that such descriptive covariance structure efficiently derives a probabilism programming description accurately.

Automatic Construction of Nonparametric Relational Regression Models for Multiple Time Series

- Computer ScienceICML
- 2016

This work proposes two relational kernel learning methods which can model multiple time-series data sets by finding common, shared causes of changes and shows that the relationalkernel learning methods find more accurate models for regression problems on several real-world data sets.

The Automatic Statistician: A Relational Perspective

- Computer ScienceArXiv
- 2015

This work proposes two relational kernel learning methods which can model multiple time-series data sets by finding common, shared causes of changes and shows that the relationalkernel learning methods find more accurate models for regression problems on several real-world data sets.

Scaling up the Automatic Statistician: Scalable Structure Discovery using Gaussian Processes

- Computer ScienceAISTATS
- 2018

This paper proposes Scalable Kernel Composition (SKC), a scalable kernel search algorithm that extends the Automatic Statistician to bigger data sets and derives a cheap upper bound on the GP marginal likelihood that sandwiches the marginal likelihood with the variational lower bound.

Time Series Structure Discovery via Probabilistic Program Synthesis

- Computer Science
- 2016

This paper shows how to extend Automatic Bayesian Covariance Discovery by formulating it in terms of probabilistic program synthesis, and demonstrates an application to time series clustering that involves a non-parametric extension to ABCD, experiments for interpolation and extrapolation on real-world econometric data, and improvements in accuracy over both non- Parametric and standard regression baselines.

Bayesian synthesis of probabilistic programs for automatic data modeling

- Computer ScienceProc. ACM Program. Lang.
- 2019

Experimental results show that the techniques presented can accurately infer qualitative structure in multiple real-world data sets and outperform standard data analysis methods in forecasting and predicting new data.

Human-like Time Series Summaries via Trend Utility Estimation

- Computer ScienceArXiv
- 2020

This paper proposes a model to create human-like text descriptions for time series that finds patterns in time series data and ranks these patterns based on empirical observations of human behavior using utility estimation.

Model Selection for Gaussian Process Regression

- Computer ScienceGCPR
- 2017

Based on the principle of posterior agreement, a general framework for model selection to rank kernels for Gaussian process regression is developed and compared with maximum evidence and leave-one-out cross-validation.

Automatic Bayesian Density Analysis

- Computer ScienceAAAI
- 2019

Automatic Bayesian Density Analysis (ABDA) allows for automatic and efficient missing value estimation, statistical data type and likelihood discovery, anomaly detection and dependency structure mining, on top of providing accurate density estimation.

Chapter 9 The Automatic Statistician

- Computer Science
- 2018

This chapter describes the common architecture of such Automatic Statistician systems, and discusses some of the design decisions and technical challenges.

## References

SHOWING 1-10 OF 34 REFERENCES

Exploiting compositionality to explore a large space of model structures

- Computer ScienceUAI
- 2012

This work organizes a space of matrix decomposition models into a context-free grammar which generates a wide variety of structures through the compositional application of a few simple rules and automatically chooses the decomposition structure from raw data by evaluating only a small fraction of all models.

Evolution of Covariance Functions for Gaussian Process Regression Using Genetic Programming

- Computer ScienceEUROCAST
- 2013

An approach to evolve composite covariance functions for Gaussian processes using genetic programming and genetic programming to search over the space of sentences that can be derived from the grammar is described.

Declarative Bias in Equation Discovery

- Computer Science
- 1997

An equation discovery system Lagramge is presented that uses grammars to deene and restrict its hypothesis space, and was successfully applied to three artiicial domains and applied to a real-world problem, discovering equations that make sense in terms of domain knowledge and produce accurate predictions.

Distilling Free-Form Natural Laws from Experimental Data

- PhysicsScience
- 2009

This work proposes a principle for the identification of nontriviality, and demonstrated this approach by automatically searching motion-tracking data captured from various physical systems, ranging from simple harmonic oscillators to chaotic double-pendula, and discovered Hamiltonians, Lagrangians, and other laws of geometric and momentum conservation.

Structure Discovery in Nonparametric Regression through Compositional Kernel Search

- Computer ScienceICML
- 2013

This work defines a space of kernel structures which are built compositionally by adding and multiplying a small number of base kernels, and presents a method for searching over this space of structures which mirrors the scientific discovery process.

The discovery of structural form

- Computer ScienceProceedings of the National Academy of Sciences
- 2008

This work presents a computational model that learns structures of many different forms and that discovers which form is best for a given dataset and brings structure learning methods closer to human abilities and may lead to a deeper computational understanding of cognitive development.

Gaussian Processes for Machine Learning

- Computer ScienceAdaptive computation and machine learning
- 2009

The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics, and deals with the supervised learning problem for both regression and classification.

Gaussian Process Covariance Kernels for Pattern Discovery and Extrapolation

- Computer ScienceArXiv
- 2013

This work introduces simple closed form kernels that can be used with Gaussian processes to discover patterns and enable extrapolation, and shows that they can reconstruct standard covariances within this framework.

Nonparametric dynamics estimation for time periodic systems

- Mathematics2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton)
- 2013

This work investigates nonparametric system identification with an explicit focus on periodically recurring nonlinear effects within a Gaussian process regression framework, and designs a locally periodic covariance function to shape the hypothesis space, which allows for a structured extrapolation that is not possible with more widely used covariance functions.