Approximately Sufficient Statistics and Bayesian Computation

  title={Approximately Sufficient Statistics and Bayesian Computation},
  author={Paul Joyce and Paul Marjoram},
  journal={Statistical Applications in Genetics and Molecular Biology},
  • P. JoyceP. Marjoram
  • Published 30 August 2008
  • Mathematics
  • Statistical Applications in Genetics and Molecular Biology
The analysis of high-dimensional data sets is often forced to rely upon well-chosen summary statistics. A systematic approach to choosing such statistics, which is based upon a sound theoretical framework, is currently lacking. In this paper we develop a sequential scheme for scoring statistics according to whether their inclusion in the analysis will substantially improve the quality of inference. Our method can be applied to high-dimensional data sets for which exact likelihood equations are… 

Tables from this paper

Choosing summary statistics by least angle regression for approximate Bayesian computation

The development of a new algorithm that is based on least angle regression for choosing summary statistics and the performance of the new algorithm is better than a previously proposed approach that uses partial least squares.

Local dimension reduction of summary statistics for likelihood-free inference

A localization strategy is introduced for any projection-based dimension reduction method, in which the transformation is estimated in the neighborhood of the observed data instead of the whole space, to improve the estimation accuracy for localized versions of linear regression and partial least squares.

Multi-Statistic Approximate Bayesian Computation with Multi-Armed Bandits

This paper proposes to treat the problem of dynamically selecting an appropriate summary statistic from a given pool of candidate summary statistics as a multi-armed bandit problem, which allows approximate Bayesian computation rejection sampling to dynamically focus on a distribution over well performing Summary statistics as opposed to a fixed set of statistics.

On Optimal Selection of Summary Statistics for Approximate Bayesian Computation

  • M. NunesD. Balding
  • Computer Science
    Statistical applications in genetics and molecular biology
  • 2010
It was found that the optimal set of summary statistics was highly dataset specific, suggesting that more generally there may be no globally-optimal choice, which argues for a new selection for each dataset even if the model and target of inference are unchanged.

Constructing Summary Statistics for Approximate Bayesian Computation: Semi-automatic ABC

This work shows how to construct appropriate summary statistics for ABC in a semi-automatic manner, and shows that optimal summary statistics are the posterior means of the parameters, while these cannot be calculated analytically.

A Novel Approach for Choosing Summary Statistics in Approximate Bayesian Computation

An approach for choosing summary statistics based on boosting, a technique from the machine-learning literature, is proposed and it is found that ABC with summary statistics chosen locally via boosting with the L2-loss performs best.

Choice of Summary Statistic Weights in Approximate Bayesian Computation

In this paper, we develop a Genetic Algorithm that can address the fundamental problem of how one should weight the summary statistics included in an approximate Bayesian computation analysis built

Simulation-based bayesian analysis of complex data

This paper argues for the advantage of a simulation-based approximate Bayesian method that remains tractable when tractability of other methods is lost, and demonstrates the utility of simulation- based analyses of large datasets within a rigorous statistical framework.

Summary Statistics in Approximate Bayesian Computation

This chapter reviews the methods which have been proposed to select low dimensional summaries for ABC, extending the previous review paper of Blum et al. (2013) with recent developments.



Approximate Bayesian computation in population genetics.

A key advantage of the method is that the nuisance parameters are automatically integrated out in the simulation step, so that the large numbers of nuisance parameters that arise in population genetics problems can be handled without difficulty.

Partition structures and sufficient statistics

  • P. Joyce
  • Mathematics
    Journal of Applied Probability
  • 1998
Is the Ewens distribution the only one-parameter family of partition structures where the total number of types sampled is a sufficient statistic? In general, the answer is no. It is shown that all

Monte Carlo Sampling Methods Using Markov Chains and Their Applications

SUMMARY A generalization of the sampling method introduced by Metropolis et al. (1953) is presented along with an exposition of the relevant theory, techniques of application and methods and

Approximate Bayesian Computation and MCMC

Methods for simulating observations from posterior distributions without the use of likelihoods are discussed, using an example concerning inference in the fossil record and a novel Markov chain Monte Carlo approach.

Markov chain Monte Carlo without likelihoods

A Markov chain Monte Carlo method for generating observations from a posterior distribution without the use of likelihoods is presented, which can be used in frequentist applications, in particular for maximum-likelihood estimation.

Sequential Monte Carlo without likelihoods

This work proposes a sequential Monte Carlo sampler that convincingly overcomes inefficiencies of existing methods and demonstrates its implementation through an epidemiological study of the transmission rate of tuberculosis.

Modern computational approaches for analysing molecular genetic variation data

This work outlines some of these model-based approaches, including the coalescent, and discusses the applicability of the computational methods that are necessary given the highly complex nature of current and future data sets.

Statistical Tests of the Coalescent Model Based on the Haplotype Frequency Distribution and the Number of Segregating Sites

A “haplotype configuration test” of neutrality (HCT) based on the full haplotype frequency distribution is developed and the utility of the HCT is demonstrated in simulations of alternative models and in application to data from Drosophila melanogaster.

The sampling theory of selectively neutral alleles.

  • W. Ewens
  • Mathematics
    Theoretical population biology
  • 1972

Stochastic simulation

  • B. Ripley
  • Computer Science
    Wiley series in probability and mathematical statistics : applied probability and statistics
  • 1987
Brian D. Ripley's Stochastic Simulation is a short, yet ambitious, survey of modern simulation techniques, and three themes run throughout the book.