# On Optimal Selection of Summary Statistics for Approximate Bayesian Computation

@article{Nunes2010OnOS, title={On Optimal Selection of Summary Statistics for Approximate Bayesian Computation}, author={Matthew A. Nunes and David Joseph Balding}, journal={Statistical Applications in Genetics and Molecular Biology}, year={2010}, volume={9} }

How best to summarize large and complex datasets is a problem that arises in many areas of science. We approach it from the point of view of seeking data summaries that minimize the average squared error of the posterior distribution for a parameter of interest under approximate Bayesian computation (ABC). In ABC, simulation under the model replaces computation of the likelihood, which is convenient for many complex models. Simulated and observed datasets are usually compared using summary…

## 157 Citations

### Summary statistics and sequential methods for approximate Bayesian computation

- Computer Science
- 2011

This thesis looks at two related methodological issues for ABC: a method is proposed to construct appropriate summary statistics for ABC in a semi-automatic manner, and an alternative sequential ABC approach is proposed in which simulated and observed data are compared for each data set and combined to give overall results.

### A Novel Approach for Choosing Summary Statistics in Approximate Bayesian Computation

- BiologyGenetics
- 2012

An approach for choosing summary statistics based on boosting, a technique from the machine-learning literature, is proposed and it is found that ABC with summary statistics chosen locally via boosting with the L2-loss performs best.

### K2-ABC: Approximate Bayesian Computation with Kernel Embeddings

- Computer ScienceAISTATS
- 2016

This paper proposes a fully nonparametric ABC paradigm which circumvents the need for manually selecting summary statistics, and uses maximum mean discrepancy (MMD) as a dissimilarity measure between the distributions over observed and simulated data.

### Selection of Summary Statistics for Network Model Choice with Approximate Bayesian Computation

- Computer Science
- 2021

The findings show that computationally inexpensive summary statistics can be efficiently selected with minimal impact on classification accuracy and it is found that networks with a smaller number of nodes can only be employed to eliminate a moderate number of summaries.

### An automatic adaptive method to combine summary statistics in approximate Bayesian computation

- Computer SciencePloS one
- 2020

This work develops an automatic, adaptive algorithm that aims to maximize the distance between the prior and the approximate posterior by automatically adapting the weights within the ABC distance function, using a nearest neighbour estimator of thedistance between distributions.

### DR-ABC: Approximate Bayesian Computation with Kernel-Based Distribution Regression

- Computer ScienceICML
- 2016

A novel framework is developed that model the functional relationship between data distributions and the optimal choice of summary statistics using kernel-based distribution regression and can be implemented in a computationally and statistically efficient way using the random Fourier features framework for large-scale kernel learning.

### A comparative review of dimension reduction methods in approximate Bayesian computation

- Computer Science
- 2013

This article provides a comprehensive review and comparison of the performance of the principal methods of dimension reduction proposed in the ABC literature, split into three nonmutually exclusive classes consisting of best subset selection methods, projection techniques and regularization.

### Multi-Statistic Approximate Bayesian Computation with Multi-Armed Bandits

- Computer ScienceArXiv
- 2018

This paper proposes to treat the problem of dynamically selecting an appropriate summary statistic from a given pool of candidate summary statistics as a multi-armed bandit problem, which allows approximate Bayesian computation rejection sampling to dynamically focus on a distribution over well performing Summary statistics as opposed to a fixed set of statistics.

### Selecting Summary Statistics in Approximate Bayesian Computation for Calibrating Stochastic Models

- Computer ScienceBioMed research international
- 2013

The user requirement to choose effective summary statistics in order to accurately estimate the posterior distribution of model parameters is investigated and illustrated by example, using a model and corresponding real data of mitochondrial DNA population dynamics.

### Choosing summary statistics by least angle regression for approximate Bayesian computation

- Computer Science
- 2016

The development of a new algorithm that is based on least angle regression for choosing summary statistics and the performance of the new algorithm is better than a previously proposed approach that uses partial least squares.

## References

SHOWING 1-10 OF 44 REFERENCES

### Approximate Bayesian computation in population genetics.

- Computer ScienceGenetics
- 2002

A key advantage of the method is that the nuisance parameters are automatically integrated out in the simulation step, so that the large numbers of nuisance parameters that arise in population genetics problems can be handled without difficulty.

### Approximately Sufficient Statistics and Bayesian Computation

- MathematicsStatistical applications in genetics and molecular biology
- 2008

A sequential scheme for scoring statistics according to whether their inclusion in the analysis will substantially improve the quality of inference, which can be applied to high-dimensional data sets for which exact likelihood equations are not possible.

### Non-linear regression models for Approximate Bayesian Computation

- Computer ScienceStat. Comput.
- 2010

A machine-learning approach to the estimation of the posterior density by introducing two innovations that fits a nonlinear conditional heteroscedastic regression of the parameter on the summary statistics, and then adaptively improves estimation using importance sampling.

### ABCtoolbox: a versatile toolkit for approximate Bayesian computations

- Biology, Computer ScienceBMC Bioinformatics
- 2009

ABCtoolbox allows a user to perform all the necessary steps of a full ABC analysis, from parameter sampling from prior distributions, data simulations, computation of summary statistics, estimation of posterior distributions, model choice, validation of the estimation procedure, and visualization of the results.

### Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems

- Computer Science, MathematicsJournal of The Royal Society Interface
- 2008

This paper discusses and applies an ABC method based on sequential Monte Carlo (SMC) to estimate parameters of dynamical models and develops ABC SMC as a tool for model selection; given a range of different mathematical descriptions, it is able to choose the best model using the standard Bayesian model selection apparatus.

### Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation

- Computer ScienceBioinform.
- 2008

Key methods used in DIY ABC, a computer program for inference based on approximate Bayesian computation (ABC), in which scenarios can be customized by the user to fit many complex situations involving any number of populations and samples, are described.

### Efficient Approximate Bayesian Computation Coupled With Markov Chain Monte Carlo Without Likelihood

- Computer ScienceGenetics
- 2009

The principal idea is to relax the tolerance within MCMC to permit good mixing, but retain a good approximation to the posterior by a combination of subsampling the output and regression adjustment, which will realize substantial computational advances over standard ABC.

### Nearest Neighbor Estimates of Entropy

- Mathematics
- 2003

SYNOPTIC ABSTRACT Motivated by the problems in molecular sciences, we introduce new nonparametric estimators of entropy which are based on the kth nearest neighbor distances between the n sample…

### On the estimation of entropy

- Computer Science, Mathematics
- 1993

The authors' estimators are different from Joe's, and may be computed without numerical integration, but it can be shown that the same interaction of tail behaviour, smoothness and dimensionality also determines the convergence rate of Joe's estimator.

### Likelihood-Based Local Linear Estimation of the Conditional Variance Function

- Mathematics
- 2004

We consider estimation of mean and variance functions with kernel-weighted local polynomial fitting in a heteroscedastic nonparametric regression model. Our preferred estimators are based on a…