Model Selection and Parameter Inference in Phylogenetics Using Nested Sampling

@article{Russel2019ModelSA,
  title={Model Selection and Parameter Inference in Phylogenetics Using Nested Sampling},
  author={Patricio Maturana Russel and Brendon J. Brewer and Steffen Klaere and Remco R. Bouckaert},
  journal={Systematic Biology},
  year={2019},
  volume={68},
  pages={219–233}
}
&NA; Bayesian inference methods rely on numerical algorithms for both model selection and parameter inference. In general, these algorithms require a high computational effort to yield reliable estimates. One of the major challenges in phylogenetics is the estimation of the marginal likelihood. This quantity is commonly used for comparing different evolutionary models, but its calculation, even for simple models, incurs high computational cost. Another interesting challenge relates to the… 

Figures and Tables from this paper

Quantifying the impact of an inference model in Bayesian phylogenetics
TLDR
Pirouette is a free and open-source R package that assesses the inference error made by Bayesian phylogenetics for a given macroevolutionary diversification model and makes use of BEAST2, but its philosophy applies to any Bayesian clustering inference tool.
Title Marginal likelihoods in phylogenetics : a review of methods and applications Permalink
TLDR
An intuitive description of marginal likelihoods is provided and how they can be used to learn about models of evolution from biological data and future directions that promise to improve the approximation ofMarginal likelihoods and Bayesian phylogenetics as a whole are discussed.
Marginal likelihoods in phylogenetics: a review of methods and applications.
TLDR
This work categorize and review methods for estimating marginal likelihoods of phylogenetic models, highlighting several recent methods that provide well-behaved estimates and discussing the challenges of Bayesian model choice and future directions that promise to improve the approximation ofMarginal likelihoods and Bayesian phylogenetics as a whole.
Marginal Likelihoods in Phylogenetics: A Review of Methods and Applications
TLDR
This work categorize and review methods for estimating marginal likelihoods of phylogenetic models, highlighting several recent methods that provide well-behaved estimates and discussing the challenges of Bayesian model choice and future directions that promise to improve the approximation ofMarginal likelihoods and Bayesian phylogenetics as a whole.
19 dubious ways to compute the marginal likelihood of a phylogenetic tree topology.
TLDR
This work benchmarks the speed and accuracy of 19 different methods to compute the marginal likelihood of phylogenetic topologies on a suite of real datasets under the JC69 model, and shows that the accuracy of these methods varies widely, and that accuracy does not necessarily correlate with computational burden.
Scalable total-evidence inference from molecular and continuous characters in a Bayesian framework
TLDR
This work implements, benchmark and validate popular phylogenetic models for the study of paleontological and neontological continuous trait data, incorporating these models into the BEAST2 platform and illustrating and advancing the paradigm of Bayesian, probabilistic total evidence.
Bayesian Evaluation of Temporal Signal in Measurably Evolving Populations
TLDR
The results indicate that BETS is an effective alternative to other measures of temporal signal, which has the key advantage of allowing a coherent assessment of the entire model, including the molecular clock and tree prior which are essential aspects of Bayesian phylodynamic analyses.
Fundamental Identifiability Limits in Molecular Epidemiology
TLDR
It is shown that in the absence of strong constraints or rate priors across the entire study period, neither maximum-likelihood fitting nor Bayesian inference can reliably reconstruct the true epidemiological dynamics from phylogenetic data alone; rather, estimators can only converge to the "congruence class" of the true dynamics.
BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis
TLDR
The full range of new tools and models available on the BEAST 2.5 platform are described, which expand joint evolutionary inference in many new directions, especially for joint inference over multiple data types, non-tree models and complex phylodynamics.
Coupled MCMC in BEAST 2
TLDR
It is shown that the implemented coupled MCMC approach is exploring the same posterior probability space as regular MCMC when MCMC behaves well, and is able to retrieve more consistent estimates of tree distributions on a dataset where convergence with MCMC is problematic.
...
...

References

SHOWING 1-10 OF 71 REFERENCES
Genealogical Working Distributions for Bayesian Model Testing with Phylogenetic Uncertainty.
TLDR
A " working" distribution is introduced on the space of genealogies, which enables estimating marginal likelihoods while accommodating phylogenetic uncertainty, and two different "working" distributions are proposed that help GSS to outperform PS and SS in terms of accuracy when comparing demographic and evolutionary models applied to synthetic data and real-world examples.
Improving marginal likelihood estimation for Bayesian phylogenetic model selection.
TLDR
A new method is introduced, steppingstone sampling (SS), which uses importance sampling to estimate each ratio in a series (the "stepping stones") bridging the posterior and prior distributions, which concludes that the greatly increased accuracy of the SS and TI methods argues for their use instead of the HM method, despite the extra computation needed.
Choosing among Partition Models in Bayesian Phylogenetics
TLDR
A new more accurate method for estimating the marginal likelihood of a model and a comparison with the HM method on both simulated and empirical data shows that the generalized SS method tends to choose simpler partition schemes that are more in line with expectation based on inferred patterns of molecular evolution.
Estimating the Effective Sample Size of Tree Topologies from Bayesian Phylogenetic Analyses
TLDR
These methods are combined with two new diagnostic plots for assessing posterior samples of tree topologies, and provide new ways to assess the mixing and convergence of phylogenetic treetopologies in Bayesian MCMC analyses.
Markov Chasin Monte Carlo Algorithms for the Bayesian Analysis of Phylogenetic Trees
We further develop the Bayesian framework for analyzing aligned nucleotide sequence data to reconstruct phylogenies, assess uncertainty in the reconstructions, and perform other statistical
Improving the accuracy of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty.
TLDR
It is shown that PS and SS sampling substantially outperform these estimators and adjust the conclusions made concerning previous analyses for the three real-world data sets that were reanalyzed.
Make the most of your samples: Bayes factor estimators for high-dimensional models of sequence evolution
TLDR
It is argued that highly accurate estimation of differences in model fit for high-dimensional models requires much more computational effort than suggested in recent studies on marginal likelihood estimation.
Probability distribution of molecular evolutionary trees: A new method of phylogenetic inference
TLDR
The results of the method are found to be insensitive to changes in the rate parameter of the branching process, and the best trees estimated by the new method are the same as those from the maximum likelihood analysis of separate topologies, but the posterior probabilities are quite different from the bootstrap proportions.
Accurate model selection of relaxed molecular clocks in bayesian phylogenetics.
TLDR
A comparison with recent implementations of path sampling and stepping-stone sampling shows reassuringly that MAP identification and its Bayes factor provide similar performance to PS and SS and that these approaches considerably outperform HME, sHME, and AICM in selecting the correct underlying clock model.
Guided tree topology proposals for Bayesian phylogenetic inference.
TLDR
This work investigates the performance of common MCMC proposal distributions in terms of median and variance of run time to convergence on 11 data sets, and introduces two new Metropolized Gibbs Samplers for moving through "tree space".
...
...