Approximately Sufficient Statistics and Bayesian Computation

@article{Joyce2008ApproximatelySS,
  title={Approximately Sufficient Statistics and Bayesian Computation},
  author={Paul Joyce and Paul Marjoram},
  journal={Statistical Applications in Genetics and Molecular Biology},
  year={2008},
  volume={7}
}
  • P. Joyce, P. Marjoram
  • Published 2008
  • Computer Science, Medicine
  • Statistical Applications in Genetics and Molecular Biology
The analysis of high-dimensional data sets is often forced to rely upon well-chosen summary statistics. A systematic approach to choosing such statistics, which is based upon a sound theoretical framework, is currently lacking. In this paper we develop a sequential scheme for scoring statistics according to whether their inclusion in the analysis will substantially improve the quality of inference. Our method can be applied to high-dimensional data sets for which exact likelihood equations are… Expand

Tables and Topics from this paper

Choosing summary statistics by least angle regression for approximate Bayesian computation
ABSTRACT Bayesian statistical inference relies on the posterior distribution. Depending on the model, the posterior can be more or less difficult to derive. In recent years, there has been a lot ofExpand
Local dimension reduction of summary statistics for likelihood-free inference
TLDR
A localization strategy is introduced for any projection-based dimension reduction method, in which the transformation is estimated in the neighborhood of the observed data instead of the whole space, to improve the estimation accuracy for localized versions of linear regression and partial least squares. Expand
Multi-Statistic Approximate Bayesian Computation with Multi-Armed Bandits
TLDR
This paper proposes to treat the problem of dynamically selecting an appropriate summary statistic from a given pool of candidate summary statistics as a multi-armed bandit problem, which allows approximate Bayesian computation rejection sampling to dynamically focus on a distribution over well performing Summary statistics as opposed to a fixed set of statistics. Expand
On Optimal Selection of Summary Statistics for Approximate Bayesian Computation
  • M. Nunes, D. Balding
  • Medicine, Mathematics
  • Statistical applications in genetics and molecular biology
  • 2010
TLDR
It was found that the optimal set of summary statistics was highly dataset specific, suggesting that more generally there may be no globally-optimal choice, which argues for a new selection for each dataset even if the model and target of inference are unchanged. Expand
Constructing Summary Statistics for Approximate Bayesian Computation: Semi-automatic ABC
TLDR
This work shows how to construct appropriate summary statistics for ABC in a semi-automatic manner, and shows that optimal summary statistics are the posterior means of the parameters, while these cannot be calculated analytically. Expand
A Novel Approach for Choosing Summary Statistics in Approximate Bayesian Computation
TLDR
An approach for choosing summary statistics based on boosting, a technique from the machine-learning literature, is proposed and it is found that ABC with summary statistics chosen locally via boosting with the L2-loss performs best. Expand
Choice of Summary Statistic Weights in Approximate Bayesian Computation
  • Hsuan Jung, P. Marjoram
  • Computer Science, Medicine
  • Statistical applications in genetics and molecular biology
  • 2011
In this paper, we develop a Genetic Algorithm that can address the fundamental problem of how one should weight the summary statistics included in an approximate Bayesian computation analysis builtExpand
Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation (with Discussion)
TLDR
This work shows how to construct appropriate summary statistics for ABC in a semi-automatic manner, and shows that optimal summary statistics are the posterior means of the parameters. Expand
Simulation-based bayesian analysis of complex data
TLDR
This paper argues for the advantage of a simulation-based approximate Bayesian method that remains tractable when tractability of other methods is lost, and demonstrates the utility of simulation- based analyses of large datasets within a rigorous statistical framework. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 22 REFERENCES
Approximate Bayesian computation in population genetics.
TLDR
A key advantage of the method is that the nuisance parameters are automatically integrated out in the simulation step, so that the large numbers of nuisance parameters that arise in population genetics problems can be handled without difficulty. Expand
Partition structures and sufficient statistics
Is the Ewens distribution the only one-parameter family of partition structures where the total number of types sampled is a sufficient statistic? In general, the answer is no. It is shown that allExpand
Monte Carlo Sampling Methods Using Markov Chains and Their Applications
SUMMARY A generalization of the sampling method introduced by Metropolis et al. (1953) is presented along with an exposition of the relevant theory, techniques of application and methods andExpand
Approximate Bayesian Computation and MCMC
TLDR
Methods for simulating observations from posterior distributions without the use of likelihoods are discussed, using an example concerning inference in the fossil record and a novel Markov chain Monte Carlo approach. Expand
Markov chain Monte Carlo without likelihoods
TLDR
A Markov chain Monte Carlo method for generating observations from a posterior distribution without the use of likelihoods is presented, which can be used in frequentist applications, in particular for maximum-likelihood estimation. Expand
Sequential Monte Carlo without likelihoods
TLDR
This work proposes a sequential Monte Carlo sampler that convincingly overcomes inefficiencies of existing methods and demonstrates its implementation through an epidemiological study of the transmission rate of tuberculosis. Expand
Modern computational approaches for analysing molecular genetic variation data
TLDR
This work outlines some of these model-based approaches, including the coalescent, and discusses the applicability of the computational methods that are necessary given the highly complex nature of current and future data sets. Expand
Statistical Tests of the Coalescent Model Based on the Haplotype Frequency Distribution and the Number of Segregating Sites
TLDR
A “haplotype configuration test” of neutrality (HCT) based on the full haplotype frequency distribution is developed and the utility of the HCT is demonstrated in simulations of alternative models and in application to data from Drosophila melanogaster. Expand
Coalescent Theory
The coalescent process is a powerful modeling tool for population genetics. The allelic states of all homologous gene copies in a population are determined by the genealogical and mutational historyExpand
The sampling theory of selectively neutral alleles.
  • W. Ewens
  • Mathematics, Medicine
  • Theoretical population biology
  • 1972
TLDR
This paper considers deductive and subsequently inductive questions relating to a sample of genes from a selectively neutral locus, and the test of the hypothesis that the alleles being sampled are indeed selectively neutral will be considered. Expand
...
1
2
3
...