On Optimal Selection of Summary Statistics for Approximate Bayesian Computation

@article{Nunes2010OnOS,
  title={On Optimal Selection of Summary Statistics for Approximate Bayesian Computation},
  author={Matthew A. Nunes and David J. Balding},
  journal={Statistical Applications in Genetics and Molecular Biology},
  year={2010},
  volume={9}
}
  • M. Nunes, D. Balding
  • Published 2010
  • Medicine, Mathematics
  • Statistical Applications in Genetics and Molecular Biology
How best to summarize large and complex datasets is a problem that arises in many areas of science. We approach it from the point of view of seeking data summaries that minimize the average squared error of the posterior distribution for a parameter of interest under approximate Bayesian computation (ABC). In ABC, simulation under the model replaces computation of the likelihood, which is convenient for many complex models. Simulated and observed datasets are usually compared using summary… Expand
Summary statistics and sequential methods for approximate Bayesian computation
TLDR
This thesis looks at two related methodological issues for ABC: a method is proposed to construct appropriate summary statistics for ABC in a semi-automatic manner, and an alternative sequential ABC approach is proposed in which simulated and observed data are compared for each data set and combined to give overall results. Expand
A Novel Approach for Choosing Summary Statistics in Approximate Bayesian Computation
TLDR
An approach for choosing summary statistics based on boosting, a technique from the machine-learning literature, is proposed and it is found that ABC with summary statistics chosen locally via boosting with the L2-loss performs best. Expand
K2-ABC: Approximate Bayesian Computation with Kernel Embeddings
TLDR
This paper proposes a fully nonparametric ABC paradigm which circumvents the need for manually selecting summary statistics, and uses maximum mean discrepancy (MMD) as a dissimilarity measure between the distributions over observed and simulated data. Expand
An automatic adaptive method to combine summary statistics in approximate Bayesian computation
TLDR
This work develops an automatic, adaptive algorithm that aims to maximize the distance between the prior and the approximate posterior by automatically adapting the weights within the ABC distance function, using a nearest neighbour estimator of thedistance between distributions. Expand
DR-ABC: Approximate Bayesian Computation with Kernel-Based Distribution Regression
TLDR
A novel framework is developed that model the functional relationship between data distributions and the optimal choice of summary statistics using kernel-based distribution regression and can be implemented in a computationally and statistically efficient way using the random Fourier features framework for large-scale kernel learning. Expand
A comparative review of dimension reduction methods in approximate Bayesian computation
Approximate Bayesian computation (ABC) methods make use of comparisons between simulated and observed summary statistics to overcome the problem of computationally intractable likelihood functions.Expand
Multi-Statistic Approximate Bayesian Computation with Multi-Armed Bandits
TLDR
This paper proposes to treat the problem of dynamically selecting an appropriate summary statistic from a given pool of candidate summary statistics as a multi-armed bandit problem, which allows approximate Bayesian computation rejection sampling to dynamically focus on a distribution over well performing Summary statistics as opposed to a fixed set of statistics. Expand
Selecting Summary Statistics in Approximate Bayesian Computation for Calibrating Stochastic Models
TLDR
The user requirement to choose effective summary statistics in order to accurately estimate the posterior distribution of model parameters is investigated and illustrated by example, using a model and corresponding real data of mitochondrial DNA population dynamics. Expand
Choosing summary statistics by least angle regression for approximate Bayesian computation
ABSTRACT Bayesian statistical inference relies on the posterior distribution. Depending on the model, the posterior can be more or less difficult to derive. In recent years, there has been a lot ofExpand
Local dimension reduction of summary statistics for likelihood-free inference
TLDR
A localization strategy is introduced for any projection-based dimension reduction method, in which the transformation is estimated in the neighborhood of the observed data instead of the whole space, to improve the estimation accuracy for localized versions of linear regression and partial least squares. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 46 REFERENCES
Approximate Bayesian computation in population genetics.
TLDR
A key advantage of the method is that the nuisance parameters are automatically integrated out in the simulation step, so that the large numbers of nuisance parameters that arise in population genetics problems can be handled without difficulty. Expand
Approximately Sufficient Statistics and Bayesian Computation
  • P. Joyce, P. Marjoram
  • Computer Science, Medicine
  • Statistical applications in genetics and molecular biology
  • 2008
TLDR
A sequential scheme for scoring statistics according to whether their inclusion in the analysis will substantially improve the quality of inference, which can be applied to high-dimensional data sets for which exact likelihood equations are not possible. Expand
Non-linear regression models for Approximate Bayesian Computation
TLDR
A machine-learning approach to the estimation of the posterior density by introducing two innovations that fits a nonlinear conditional heteroscedastic regression of the parameter on the summary statistics, and then adaptively improves estimation using importance sampling. Expand
ABCtoolbox: a versatile toolkit for approximate Bayesian computations
TLDR
ABCtoolbox allows a user to perform all the necessary steps of a full ABC analysis, from parameter sampling from prior distributions, data simulations, computation of summary statistics, estimation of posterior distributions, model choice, validation of the estimation procedure, and visualization of the results. Expand
Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems
TLDR
This paper discusses and applies an ABC method based on sequential Monte Carlo (SMC) to estimate parameters of dynamical models and develops ABC SMC as a tool for model selection; given a range of different mathematical descriptions, it is able to choose the best model using the standard Bayesian model selection apparatus. Expand
Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation
TLDR
Key methods used in DIY ABC, a computer program for inference based on approximate Bayesian computation (ABC), in which scenarios can be customized by the user to fit many complex situations involving any number of populations and samples, are described. Expand
Efficient Approximate Bayesian Computation Coupled With Markov Chain Monte Carlo Without Likelihood
TLDR
The principal idea is to relax the tolerance within MCMC to permit good mixing, but retain a good approximation to the posterior by a combination of subsampling the output and regression adjustment, which will realize substantial computational advances over standard ABC. Expand
Nearest Neighbor Estimates of Entropy
SYNOPTIC ABSTRACT Motivated by the problems in molecular sciences, we introduce new nonparametric estimators of entropy which are based on the kth nearest neighbor distances between the n sampleExpand
On the estimation of entropy
Motivated by recent work of Joe (1989,Ann. Inst. Statist. Math.,41, 683–697), we introduce estimators of entropy and describe their properties. We study the effects of tail behaviour, distributionExpand
Likelihood-Based Local Linear Estimation of the Conditional Variance Function
We consider estimation of mean and variance functions with kernel-weighted local polynomial fitting in a heteroscedastic nonparametric regression model. Our preferred estimators are based on aExpand
...
1
2
3
4
5
...