PhyloBayes MPI: phylogenetic reconstruction with infinite mixtures of profiles in a parallel environment.

  title={PhyloBayes MPI: phylogenetic reconstruction with infinite mixtures of profiles in a parallel environment.},
  author={Nicolas Lartillot and Nicolas Rodrigue and Daniel Stubbs and Jacques Richer},
  journal={Systematic biology},
  volume={62 4},
Modeling across site variation of the substitution process is increasingly recognized as important for obtaining more accurate phylogenetic reconstructions. Both finite and infinite mixture models have been proposed and have been shown to significantly improve on classical single-matrix models. Compared with their finite counterparts, infinite mixtures have a greater expressivity. However, they are computationally more challenging. This has resulted in practical compromises in the design of… 

Figures from this paper

Stochastic Variational Inference for Bayesian Phylogenetics: A Case of CAT Model

A variational Bayesian procedure to speed up the widely used PhyloBayes MPI program, which deals with the heterogeneity of amino acid profiles, which accurately approximated the Bayesian phylogenetic tree, mixture proportions, and the amino acid propensity of each component of the mixture while using orders of magnitude less computational time.

Stochastic Variational Inference of Mixture Models in Phylogenetics

A variational Bayesian procedure to speed up the widely used PhyloBayes MPI program, which deals with the heterogeneity of amino acid propensity, which accurately approximated the Bayesian phylogenetic tree, mixture proportions, and the amino acids propensity of each component of the mixture while using orders of magnitude less computational time.

Site-heterogeneous mutation-selection models within the PhyloBayes-MPI package

A scalable, message-passing-interface-based Bayesian implementation of site-heterogeneous codon models in the mutation-selection framework that jointly infers the global mutational parameters at the nucleotide level, the branch lengths of the tree and a Dirichlet process governing across-site variation at the amino acid level.

Accelerating Bayesian inference for evolutionary biology models

A parallel Metropolis‐Hastings (M‐H) framework built with a novel combination of enhancements aimed towards parameter‐rich and complex models achieves up to a twentyfold faster convergence to estimate the posterior probability of phylogenetic trees using 32 processors when compared to the well‐known software MrBayes for Bayesian inference of phylogenetics trees.

PhyloBayes: Bayesian Phylogenetics Using Site-heterogeneous Models

This chapter provides a detailed step-by-step practical introduction to phylogenetic analyses using PhyloBayes, using as an example a previously published dataset addressing the phylogenetic position of Microsporidia within eukaryotes.

Modeling Site Heterogeneity with Posterior Mean Site Frequency Profiles Accelerates Accurate Phylogenomic Estimation

PMSF provided more accurate estimates of phylogenies than the mixture models from which they derive and allows full nonparametric bootstrap analyses to be conducted under complex site‐heterogeneous models on large concatenated data matrices.

Who Let the CAT Out of the Bag? Accurately Dealing with Substitutional Heterogeneity in Phylogenomic Analyses

It is concluded that partitioning and CAT‐GTR perform similarly in recovering accurate branching patterns, however, computation time can be orders of magnitude less for data partitioning, with commonly used implementations of CAT‐ GTR often failing to reach completion in a reasonable time frame.

The Relative Importance of Modeling Site Pattern Heterogeneity Versus Partition-Wise Heterotachy in Phylogenomic Inference.

In cases of extreme protein-wise heterotachy, approaches that cluster similarly-evolving proteins together and coupled with site-heterogeneous models work well for phylogenetic estimation, whereas site-homogenous models (with or without partitioning) did not.

Dissecting phylogenetic signal and accounting for bias in whole-genome data sets: a case study of the Metazoa

This study assemble a novel data set comprised of 1,080 orthologous loci derived from 36 publicly available genomes and dissect the phylogenetic signal present in each individual partition, providing a workflow for minimizing systematic bias in whole genome-based phylogenetic analyses.

Accelerated Estimation of Frequency Classes in Site‐Heterogeneous Profile Mixture Models

A composite likelihood approach to estimation of component frequencies for a mixture model that directly uses the data from the alignment of interest is proposed, and in simulations, the approach is shown to provide large improvements over hierarchical clustering.



Generalized mixture models for molecular phylogenetic estimation.

A mixture model approach that uses reversible jump Markov chain Monte Carlo (MCMC) estimation to permit as many distinct models as the data require to permit hard polytomies (i.e., zero-length internal branches).

Uniformization for sampling realizations of Markov processes: applications to Bayesian implementations of codon substitution models

A general method, based on a uniformization technique, which can be utilized to generate realizations of a Markovian substitution process conditional on an alignment of character states and a given tree topology is described.

Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model

The CAT model appears to be more robust than WAG against LBA artefacts, essentially because it correctly anticipates the high probability of convergences and reversions implied by the small effective size of the amino-acid alphabet at each site of the alignment.

Conjugate Gibbs Sampling for Bayesian Phylogenetic Models

The conjugate Gibbs formalism allows one to propose efficient implementations of complex models, for instance assuming site-specific substitution processes, that would not be accessible to standard MCMC methods.

PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating

A software package, PhyloBayes 3, is proposed, which can be used for conducting Bayesian phylogenetic reconstruction and molecular dating analyses, using a large variety of amino acid replacement and nucleotide substitution models, including empirical mixtures or non-parametric models, as well as alternative clock relaxation processes.

Phylogenetic mixture models for proteins

This paper explores in maximum-likelihood framework phylogenetic mixture models that combine several amino acid replacement matrices to better fit protein evolution and shows that highly significant likelihood gains are obtained when using mixture models compared with the best available single replacement Matrices.

Improvement of molecular phylogenetic inference and the phylogeny of Bilateria

The sister-group relationship of Platyhelminthes and Annelida to the exclusion of Mollusca, contradicting the Neotrochozoa hypothesis, and, with a lower statistical support, the paraphyly of Deuterostomia are discussed in an evo–devo framework.

A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process.

The results suggest that the complexity of the pattern of substitution of real sequences is better captured by the CAT model, offering the possibility of studying its impact on phylogenetic reconstruction and its connections with structure-function determinants.

Resolving Difficult Phylogenetic Questions: Why More Sequences Are Not Enough

Three recent large-scale phylogenomics studies, which deal with the early diversification of animals, produced highly incongruent findings despite the use of considerable sequence data, suggesting that merely adding more sequences is not enough to resolve the inconsistencies.

A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data.

A general likelihood-based 'mixture model' for inferring phylogenetic trees from gene-sequence or other character-state data that simplifies to a homogeneous model or a rate-variability model as special cases and always performs at least as well as these two approaches, and often considerably improves upon them.