On Optimal Selection of Summary Statistics for Approximate Bayesian Computation

  title={On Optimal Selection of Summary Statistics for Approximate Bayesian Computation},
  author={Matthew A. Nunes and David Joseph Balding},
  journal={Statistical Applications in Genetics and Molecular Biology},
  • M. NunesD. Balding
  • Published 6 September 2010
  • Computer Science
  • Statistical Applications in Genetics and Molecular Biology
How best to summarize large and complex datasets is a problem that arises in many areas of science. We approach it from the point of view of seeking data summaries that minimize the average squared error of the posterior distribution for a parameter of interest under approximate Bayesian computation (ABC). In ABC, simulation under the model replaces computation of the likelihood, which is convenient for many complex models. Simulated and observed datasets are usually compared using summary… 

Tables from this paper

Summary statistics and sequential methods for approximate Bayesian computation

This thesis looks at two related methodological issues for ABC: a method is proposed to construct appropriate summary statistics for ABC in a semi-automatic manner, and an alternative sequential ABC approach is proposed in which simulated and observed data are compared for each data set and combined to give overall results.

A Novel Approach for Choosing Summary Statistics in Approximate Bayesian Computation

An approach for choosing summary statistics based on boosting, a technique from the machine-learning literature, is proposed and it is found that ABC with summary statistics chosen locally via boosting with the L2-loss performs best.

K2-ABC: Approximate Bayesian Computation with Kernel Embeddings

This paper proposes a fully nonparametric ABC paradigm which circumvents the need for manually selecting summary statistics, and uses maximum mean discrepancy (MMD) as a dissimilarity measure between the distributions over observed and simulated data.

Selection of Summary Statistics for Network Model Choice with Approximate Bayesian Computation

The findings show that computationally inexpensive summary statistics can be efficiently selected with minimal impact on classification accuracy and it is found that networks with a smaller number of nodes can only be employed to eliminate a moderate number of summaries.

An automatic adaptive method to combine summary statistics in approximate Bayesian computation

This work develops an automatic, adaptive algorithm that aims to maximize the distance between the prior and the approximate posterior by automatically adapting the weights within the ABC distance function, using a nearest neighbour estimator of thedistance between distributions.

DR-ABC: Approximate Bayesian Computation with Kernel-Based Distribution Regression

A novel framework is developed that model the functional relationship between data distributions and the optimal choice of summary statistics using kernel-based distribution regression and can be implemented in a computationally and statistically efficient way using the random Fourier features framework for large-scale kernel learning.

A comparative review of dimension reduction methods in approximate Bayesian computation

This article provides a comprehensive review and comparison of the performance of the principal methods of dimension reduction proposed in the ABC literature, split into three nonmutually exclusive classes consisting of best subset selection methods, projection techniques and regularization.

Multi-Statistic Approximate Bayesian Computation with Multi-Armed Bandits

This paper proposes to treat the problem of dynamically selecting an appropriate summary statistic from a given pool of candidate summary statistics as a multi-armed bandit problem, which allows approximate Bayesian computation rejection sampling to dynamically focus on a distribution over well performing Summary statistics as opposed to a fixed set of statistics.

Selecting Summary Statistics in Approximate Bayesian Computation for Calibrating Stochastic Models

The user requirement to choose effective summary statistics in order to accurately estimate the posterior distribution of model parameters is investigated and illustrated by example, using a model and corresponding real data of mitochondrial DNA population dynamics.

Choosing summary statistics by least angle regression for approximate Bayesian computation

The development of a new algorithm that is based on least angle regression for choosing summary statistics and the performance of the new algorithm is better than a previously proposed approach that uses partial least squares.



Approximate Bayesian computation in population genetics.

A key advantage of the method is that the nuisance parameters are automatically integrated out in the simulation step, so that the large numbers of nuisance parameters that arise in population genetics problems can be handled without difficulty.

Approximately Sufficient Statistics and Bayesian Computation

A sequential scheme for scoring statistics according to whether their inclusion in the analysis will substantially improve the quality of inference, which can be applied to high-dimensional data sets for which exact likelihood equations are not possible.

Non-linear regression models for Approximate Bayesian Computation

A machine-learning approach to the estimation of the posterior density by introducing two innovations that fits a nonlinear conditional heteroscedastic regression of the parameter on the summary statistics, and then adaptively improves estimation using importance sampling.

ABCtoolbox: a versatile toolkit for approximate Bayesian computations

ABCtoolbox allows a user to perform all the necessary steps of a full ABC analysis, from parameter sampling from prior distributions, data simulations, computation of summary statistics, estimation of posterior distributions, model choice, validation of the estimation procedure, and visualization of the results.

Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems

This paper discusses and applies an ABC method based on sequential Monte Carlo (SMC) to estimate parameters of dynamical models and develops ABC SMC as a tool for model selection; given a range of different mathematical descriptions, it is able to choose the best model using the standard Bayesian model selection apparatus.

Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation

Key methods used in DIY ABC, a computer program for inference based on approximate Bayesian computation (ABC), in which scenarios can be customized by the user to fit many complex situations involving any number of populations and samples, are described.

Efficient Approximate Bayesian Computation Coupled With Markov Chain Monte Carlo Without Likelihood

The principal idea is to relax the tolerance within MCMC to permit good mixing, but retain a good approximation to the posterior by a combination of subsampling the output and regression adjustment, which will realize substantial computational advances over standard ABC.

Nearest Neighbor Estimates of Entropy

SYNOPTIC ABSTRACT Motivated by the problems in molecular sciences, we introduce new nonparametric estimators of entropy which are based on the kth nearest neighbor distances between the n sample

On the estimation of entropy

The authors' estimators are different from Joe's, and may be computed without numerical integration, but it can be shown that the same interaction of tail behaviour, smoothness and dimensionality also determines the convergence rate of Joe's estimator.

Likelihood-Based Local Linear Estimation of the Conditional Variance Function

We consider estimation of mean and variance functions with kernel-weighted local polynomial fitting in a heteroscedastic nonparametric regression model. Our preferred estimators are based on a