19 dubious ways to compute the marginal likelihood of a phylogenetic tree topology.

  title={19 dubious ways to compute the marginal likelihood of a phylogenetic tree topology.},
  author={Mathieu Fourment and Andrew F. Magee and Chris Whidden and Arman Bilge and Frederick Albert Matsen IV and Vladimir N. Minin},
  journal={Systematic biology},
The marginal likelihood of a model is a key quantity for assessing the evidence provided by the data in support of a model. The marginal likelihood is the normalizing constant for the posterior density, obtained by integrating the product of the likelihood and the prior with respect to model parameters. Thus, the computational burden of computing the marginal likelihood scales with the dimension of the parameter space. In phylogenetics, where we work with tree topologies that are high… 

Figures and Tables from this paper

Systematic Exploration of the High Likelihood Set of Phylogenetic Tree Topologies

This paper presents an efficient parallelized method to map out the high likelihood set of phylogenetic tree topologies via systematic search, and shows that the normalized topology likelihoods are a useful proxy for the Bayesian posterior probability of those topologies.

Parallel power posterior analyses for fast computation of marginal likelihoods in phylogenetics

A general parallelization strategy is introduced that distributes the power posterior MCMC simulations and the likelihood computations over available CPUs and enables the estimation of marginal likelihoods to complete in a feasible amount of time which previously needed days, weeks or even months.

Efficient Bayesian inference of general Gaussian models on large phylogenetic trees

A scalable Bayesian inference framework under a general Gaussian trait evolution model that exploits Hamiltonian Monte Carlo enables efficient sampling of the constrained model parameters and takes advantage of the tree structure for fast likelihood and gradient computations, yielding algorithmic complexity linear in the number of observations.

Prior Density Learning in Variational Bayesian Phylogenetic Parameters Inference

This paper proposes an approach and an implementation framework to relax the rigidity of the prior densities by learning their parameters using a gradient-based method and a neural network-based parameterization and highlights that using neural networks improves the initialization of the optimization of thePrior density parameters.


A scalable Bayesian framework under a general Gaussian trait evolution model that enables efficient sampling of the constrained model parameters and takes advantage of the tree structure for fast likelihood and gradient computations is presented.

Evaluating probabilistic programming and fast variational Bayesian inference in phylogenetics

It is shown that many commonly used phylogenetic models including the general time reversible (GTR) substitution model, rate heterogeneity among sites, and a range of coalescent models can be implemented using a probabilistic programming language.

Computing Bayes: Bayesian Computation from 1763 to the 21st Century

This paper takes the reader on a chronological tour of Bayesian computation over the past two and a half centuries, and place all computational problems into a common framework, and describe all computational methods using a common notation.

Bayesian Evaluation of Temporal Signal in Measurably Evolving Populations

The results indicate that BETS is an effective alternative to other measures of temporal signal, which has the key advantage of allowing a coherent assessment of the entire model, including the molecular clock and tree prior which are essential aspects of Bayesian phylodynamic analyses.

Parsimony analysis of phylogenomic datasets (I): scripts and guidelines for using TNT (Tree Analysis using New Technology)

The computationally most efficient and versatile parsimony software, TNT, is described, which can be used for phylogenetic and phylogenomic analyses, and a series of scripts that are specifically designed for the analysis of phylogenomic datasets are described.

Lagged couplings diagnose Markov chain Monte Carlo phylogenetic inference

: Phylogenetic inference is an intractable statistical problem on a complex space. Markov chain Monte Carlo methods are the primary tool for Bayesian phylogenetic inference but it is challenging to



Model Selection and Parameter Inference in Phylogenetics Using Nested Sampling

Nested sampling is introduced to phylogenetics and its performance is analysed under different scenarios and compared to established methods to conclude that NS is a competitive and attractive algorithm for phylogenetic inference.

Improving marginal likelihood estimation for Bayesian phylogenetic model selection.

A new method is introduced, steppingstone sampling (SS), which uses importance sampling to estimate each ratio in a series (the "stepping stones") bridging the posterior and prior distributions, which concludes that the greatly increased accuracy of the SS and TI methods argues for their use instead of the HM method, despite the extra computation needed.

Approximate likelihood calculation on a phylogeny for Bayesian estimation of divergence times.

The use of Taylor expansion to approximate the likelihood during Markov chain Monte Carlo iteration is explored, and the results suggest that the approximate method may be useful for Bayesian dating analysis using large data sets.

Genealogical Working Distributions for Bayesian Model Testing with Phylogenetic Uncertainty.

A " working" distribution is introduced on the space of genealogies, which enables estimating marginal likelihoods while accommodating phylogenetic uncertainty, and two different "working" distributions are proposed that help GSS to outperform PS and SS in terms of accuracy when comparing demographic and evolutionary models applied to synthetic data and real-world examples.

Stochastic Variational Inference for Bayesian Phylogenetics: A Case of CAT Model

A variational Bayesian procedure to speed up the widely used PhyloBayes MPI program, which deals with the heterogeneity of amino acid profiles, which accurately approximated the Bayesian phylogenetic tree, mixture proportions, and the amino acid propensity of each component of the mixture while using orders of magnitude less computational time.

Computing Bayes factors using thermodynamic integration.

The present article proposes to employ another method, based on an analogy with statistical physics, called thermodynamic integration, which is applied to the comparison of several alternative models of amino-acid replacement, indicating that modeling pattern heterogeneity across sites tends to yield better models than standard empirical matrices.

Variational Upper Bounds for Probabilistic Phylogenetic Models

A new approximation method is presented, applicable for a wide range of probabilistic models, which guarantees to upper bound the true likelihood of data, and is complementary to known variational methods that lower bound the likelihood.

Bayesian estimation of divergence times from large sequence alignments.

  • S. Guindon
  • Biology
    Molecular biology and evolution
  • 2010
A new approach that estimates the posterior density of substitution rates and node times using a Gibbs sampling algorithm is described, demonstrating the suitability of this new method for analyzing large and/or difficult data sets.

Estimating Bayesian Phylogenetic Information Content

This work focuses on measuring information about tree topology using marginal posterior distributions of tree topologies and shows that both the accuracy and the computational efficiency of topological information content estimation improve with use of the conditional clade distribution, which also allows topological Information content to be partitioned by clade.

Efficient approximations for learning phylogenetic HMM models from data

The phylogenetic-HMM model which generalizes the classical probabilistic models of Neyman and Felsenstein is considered and it is demonstrated that, unlike the other approximations, variational methods are accurate and are guaranteed to lower bound the likelihood.