# On Optimal Selection of Summary Statistics for Approximate Bayesian Computation

@article{Nunes2010OnOS, title={On Optimal Selection of Summary Statistics for Approximate Bayesian Computation}, author={Matthew A. Nunes and David J. Balding}, journal={Statistical Applications in Genetics and Molecular Biology}, year={2010}, volume={9} }

How best to summarize large and complex datasets is a problem that arises in many areas of science. We approach it from the point of view of seeking data summaries that minimize the average squared error of the posterior distribution for a parameter of interest under approximate Bayesian computation (ABC). In ABC, simulation under the model replaces computation of the likelihood, which is convenient for many complex models. Simulated and observed datasets are usually compared using summary… Expand

#### 142 Citations

Summary statistics and sequential methods for approximate Bayesian computation

- Computer Science
- 2011

This thesis looks at two related methodological issues for ABC: a method is proposed to construct appropriate summary statistics for ABC in a semi-automatic manner, and an alternative sequential ABC approach is proposed in which simulated and observed data are compared for each data set and combined to give overall results. Expand

A Novel Approach for Choosing Summary Statistics in Approximate Bayesian Computation

- Biology, Medicine
- Genetics
- 2012

An approach for choosing summary statistics based on boosting, a technique from the machine-learning literature, is proposed and it is found that ABC with summary statistics chosen locally via boosting with the L2-loss performs best. Expand

K2-ABC: Approximate Bayesian Computation with Kernel Embeddings

- Computer Science, Mathematics
- AISTATS
- 2016

This paper proposes a fully nonparametric ABC paradigm which circumvents the need for manually selecting summary statistics, and uses maximum mean discrepancy (MMD) as a dissimilarity measure between the distributions over observed and simulated data. Expand

An automatic adaptive method to combine summary statistics in approximate Bayesian computation

- Computer Science, Mathematics
- PloS one
- 2020

This work develops an automatic, adaptive algorithm that aims to maximize the distance between the prior and the approximate posterior by automatically adapting the weights within the ABC distance function, using a nearest neighbour estimator of thedistance between distributions. Expand

DR-ABC: Approximate Bayesian Computation with Kernel-Based Distribution Regression

- Mathematics, Computer Science
- ICML
- 2016

A novel framework is developed that model the functional relationship between data distributions and the optimal choice of summary statistics using kernel-based distribution regression and can be implemented in a computationally and statistically efficient way using the random Fourier features framework for large-scale kernel learning. Expand

A comparative review of dimension reduction methods in approximate Bayesian computation

- Mathematics
- 2013

Approximate Bayesian computation (ABC) methods make use of comparisons between simulated and observed summary statistics to overcome the problem of computationally intractable likelihood functions.… Expand

Multi-Statistic Approximate Bayesian Computation with Multi-Armed Bandits

- Mathematics, Computer Science
- ArXiv
- 2018

This paper proposes to treat the problem of dynamically selecting an appropriate summary statistic from a given pool of candidate summary statistics as a multi-armed bandit problem, which allows approximate Bayesian computation rejection sampling to dynamically focus on a distribution over well performing Summary statistics as opposed to a fixed set of statistics. Expand

Selecting Summary Statistics in Approximate Bayesian Computation for Calibrating Stochastic Models

- Medicine
- BioMed research international
- 2013

The user requirement to choose effective summary statistics in order to accurately estimate the posterior distribution of model parameters is investigated and illustrated by example, using a model and corresponding real data of mitochondrial DNA population dynamics. Expand

Choosing summary statistics by least angle regression for approximate Bayesian computation

- Mathematics
- 2016

ABSTRACT Bayesian statistical inference relies on the posterior distribution. Depending on the model, the posterior can be more or less difficult to derive. In recent years, there has been a lot of… Expand

Local dimension reduction of summary statistics for likelihood-free inference

- Computer Science, Mathematics
- Stat. Comput.
- 2020

A localization strategy is introduced for any projection-based dimension reduction method, in which the transformation is estimated in the neighborhood of the observed data instead of the whole space, to improve the estimation accuracy for localized versions of linear regression and partial least squares. Expand

#### References

SHOWING 1-10 OF 46 REFERENCES

Approximate Bayesian computation in population genetics.

- Biology, Medicine
- Genetics
- 2002

A key advantage of the method is that the nuisance parameters are automatically integrated out in the simulation step, so that the large numbers of nuisance parameters that arise in population genetics problems can be handled without difficulty. Expand

Approximately Sufficient Statistics and Bayesian Computation

- Computer Science, Medicine
- Statistical applications in genetics and molecular biology
- 2008

A sequential scheme for scoring statistics according to whether their inclusion in the analysis will substantially improve the quality of inference, which can be applied to high-dimensional data sets for which exact likelihood equations are not possible. Expand

Non-linear regression models for Approximate Bayesian Computation

- Mathematics, Computer Science
- Stat. Comput.
- 2010

A machine-learning approach to the estimation of the posterior density by introducing two innovations that fits a nonlinear conditional heteroscedastic regression of the parameter on the summary statistics, and then adaptively improves estimation using importance sampling. Expand

ABCtoolbox: a versatile toolkit for approximate Bayesian computations

- Biology, Computer Science
- BMC Bioinformatics
- 2009

ABCtoolbox allows a user to perform all the necessary steps of a full ABC analysis, from parameter sampling from prior distributions, data simulations, computation of summary statistics, estimation of posterior distributions, model choice, validation of the estimation procedure, and visualization of the results. Expand

Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems

- Computer Science, Medicine
- Journal of The Royal Society Interface
- 2008

This paper discusses and applies an ABC method based on sequential Monte Carlo (SMC) to estimate parameters of dynamical models and develops ABC SMC as a tool for model selection; given a range of different mathematical descriptions, it is able to choose the best model using the standard Bayesian model selection apparatus. Expand

Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation

- Medicine, Computer Science
- Bioinform.
- 2008

Key methods used in DIY ABC, a computer program for inference based on approximate Bayesian computation (ABC), in which scenarios can be customized by the user to fit many complex situations involving any number of populations and samples, are described. Expand

Efficient Approximate Bayesian Computation Coupled With Markov Chain Monte Carlo Without Likelihood

- Biology, Medicine
- Genetics
- 2009

The principal idea is to relax the tolerance within MCMC to permit good mixing, but retain a good approximation to the posterior by a combination of subsampling the output and regression adjustment, which will realize substantial computational advances over standard ABC. Expand

Nearest Neighbor Estimates of Entropy

- Mathematics
- 2003

SYNOPTIC ABSTRACT Motivated by the problems in molecular sciences, we introduce new nonparametric estimators of entropy which are based on the kth nearest neighbor distances between the n sample… Expand

On the estimation of entropy

- Mathematics
- 1993

Motivated by recent work of Joe (1989,Ann. Inst. Statist. Math.,41, 683–697), we introduce estimators of entropy and describe their properties. We study the effects of tail behaviour, distribution… Expand

Likelihood-Based Local Linear Estimation of the Conditional Variance Function

- Mathematics
- 2004

We consider estimation of mean and variance functions with kernel-weighted local polynomial fitting in a heteroscedastic nonparametric regression model. Our preferred estimators are based on a… Expand