# Approximately Sufficient Statistics and Bayesian Computation

@article{Joyce2008ApproximatelySS, title={Approximately Sufficient Statistics and Bayesian Computation}, author={Paul Joyce and Paul Marjoram}, journal={Statistical Applications in Genetics and Molecular Biology}, year={2008}, volume={7} }

The analysis of high-dimensional data sets is often forced to rely upon well-chosen summary statistics. A systematic approach to choosing such statistics, which is based upon a sound theoretical framework, is currently lacking. In this paper we develop a sequential scheme for scoring statistics according to whether their inclusion in the analysis will substantially improve the quality of inference. Our method can be applied to high-dimensional data sets for which exact likelihood equations are…

## 252 Citations

### Choosing summary statistics by least angle regression for approximate Bayesian computation

- Computer Science
- 2016

The development of a new algorithm that is based on least angle regression for choosing summary statistics and the performance of the new algorithm is better than a previously proposed approach that uses partial least squares.

### Local dimension reduction of summary statistics for likelihood-free inference

- Computer ScienceStat. Comput.
- 2020

A localization strategy is introduced for any projection-based dimension reduction method, in which the transformation is estimated in the neighborhood of the observed data instead of the whole space, to improve the estimation accuracy for localized versions of linear regression and partial least squares.

### Multi-Statistic Approximate Bayesian Computation with Multi-Armed Bandits

- Computer ScienceArXiv
- 2018

This paper proposes to treat the problem of dynamically selecting an appropriate summary statistic from a given pool of candidate summary statistics as a multi-armed bandit problem, which allows approximate Bayesian computation rejection sampling to dynamically focus on a distribution over well performing Summary statistics as opposed to a fixed set of statistics.

### On Optimal Selection of Summary Statistics for Approximate Bayesian Computation

- Computer ScienceStatistical applications in genetics and molecular biology
- 2010

It was found that the optimal set of summary statistics was highly dataset specific, suggesting that more generally there may be no globally-optimal choice, which argues for a new selection for each dataset even if the model and target of inference are unchanged.

### Constructing Summary Statistics for Approximate Bayesian Computation: Semi-automatic ABC

- Computer Science
- 2010

This work shows how to construct appropriate summary statistics for ABC in a semi-automatic manner, and shows that optimal summary statistics are the posterior means of the parameters, while these cannot be calculated analytically.

### A Novel Approach for Choosing Summary Statistics in Approximate Bayesian Computation

- BiologyGenetics
- 2012

An approach for choosing summary statistics based on boosting, a technique from the machine-learning literature, is proposed and it is found that ABC with summary statistics chosen locally via boosting with the L2-loss performs best.

### Choice of Summary Statistic Weights in Approximate Bayesian Computation

- Computer ScienceStatistical applications in genetics and molecular biology
- 2011

In this paper, we develop a Genetic Algorithm that can address the fundamental problem of how one should weight the summary statistics included in an approximate Bayesian computation analysis built…

### Simulation-based bayesian analysis of complex data

- Computer ScienceSummerSim
- 2015

This paper argues for the advantage of a simulation-based approximate Bayesian method that remains tractable when tractability of other methods is lost, and demonstrates the utility of simulation- based analyses of large datasets within a rigorous statistical framework.

### Summary Statistics in Approximate Bayesian Computation

- Computer Science
- 2015

This chapter reviews the methods which have been proposed to select low dimensional summaries for ABC, extending the previous review paper of Blum et al. (2013) with recent developments.

## References

SHOWING 1-10 OF 18 REFERENCES

### Approximate Bayesian computation in population genetics.

- Computer ScienceGenetics
- 2002

A key advantage of the method is that the nuisance parameters are automatically integrated out in the simulation step, so that the large numbers of nuisance parameters that arise in population genetics problems can be handled without difficulty.

### Partition structures and sufficient statistics

- MathematicsJournal of Applied Probability
- 1998

Is the Ewens distribution the only one-parameter family of partition structures where the total number of types sampled is a sufficient statistic? In general, the answer is no. It is shown that all…

### Monte Carlo Sampling Methods Using Markov Chains and Their Applications

- Mathematics
- 1970

SUMMARY A generalization of the sampling method introduced by Metropolis et al. (1953) is presented along with an exposition of the relevant theory, techniques of application and methods and…

### Approximate Bayesian Computation and MCMC

- Mathematics
- 2004

Methods for simulating observations from posterior distributions without the use of likelihoods are discussed, using an example concerning inference in the fossil record and a novel Markov chain Monte Carlo approach.

### Markov chain Monte Carlo without likelihoods

- MathematicsProceedings of the National Academy of Sciences of the United States of America
- 2003

A Markov chain Monte Carlo method for generating observations from a posterior distribution without the use of likelihoods is presented, which can be used in frequentist applications, in particular for maximum-likelihood estimation.

### Sequential Monte Carlo without likelihoods

- Computer ScienceProceedings of the National Academy of Sciences
- 2007

This work proposes a sequential Monte Carlo sampler that convincingly overcomes inefficiencies of existing methods and demonstrates its implementation through an epidemiological study of the transmission rate of tuberculosis.

### Modern computational approaches for analysing molecular genetic variation data

- BiologyNature Reviews Genetics
- 2006

This work outlines some of these model-based approaches, including the coalescent, and discusses the applicability of the computational methods that are necessary given the highly complex nature of current and future data sets.

### Statistical Tests of the Coalescent Model Based on the Haplotype Frequency Distribution and the Number of Segregating Sites

- BiologyGenetics
- 2005

A “haplotype configuration test” of neutrality (HCT) based on the full haplotype frequency distribution is developed and the utility of the HCT is demonstrated in simulations of alternative models and in application to data from Drosophila melanogaster.

### The sampling theory of selectively neutral alleles.

- MathematicsTheoretical population biology
- 1972

### Stochastic simulation

- Computer ScienceWiley series in probability and mathematical statistics : applied probability and statistics
- 1987

Brian D. Ripley's Stochastic Simulation is a short, yet ambitious, survey of modern simulation techniques, and three themes run throughout the book.