Probability distribution of molecular evolutionary trees: A new method of phylogenetic inference

  title={Probability distribution of molecular evolutionary trees: A new method of phylogenetic inference},
  author={Bruce Rannala and Ziheng Yang},
  journal={Journal of Molecular Evolution},
A new method is presented for inferring evolutionary trees using nucleotide sequence data. The birth-death process is used as a model of speciation and extinction to specify the prior distribution of phylogenies and branching times. Nucleotide substitution is modeled by a continuous-time Markov process. Parameters of the branching model and the substitution model are estimated by maximum likelihood. The posterior probabilities of different phylogenies are calculated and the phylogeny with the… 

Fundamental differences between the methods of maximum likelihood and maximum posterior probability in phylogenetics.

Using a four-taxon example under a simple model of evolution, we show that the methods of maximum likelihood and maximum posterior probability (which is a Bayesian method of inference) may not arrive

MetaPIGA 4: an evolutionary computation approach for estimating phylogenetic trees

This hybrid EA, implemented as the MetaPIGA 4 software package, generates results of qualities comparable to those produced by current state-of-the-art algorithms for phylogeny inference, which outperforms both classical EA and hill-climbing algorithms.

Bayesian phylogenetic inference via Monte Carlo methods

The combinatorial sequential Monte Carlo (CSMC) method is proposed to generalize applications of SMC to non-clock tree inference based on the existence of a flexible partially ordered set (poset) structure, and it is presented in a level of generality directly applicable to many other combinatorsial spaces.

Branch-length prior influences Bayesian posterior probability of phylogeny.

It is found that posterior probabilities for trees and clades are sensitive to the prior for internal branch lengths, and priors assuming long internal branches cause high posterior probabilities in favor of extreme values.

Exact distribution of divergence times under the birth-death-sampling model

The main result provided here is a method for computing the exact distribution of the divergence times of any phylogenetic tree under a birth-death-sampling model, which has a cubic time-complexity, allowing us to deal with phylogenies of hundreds of tips on standard computers.

Empirical evaluation of a prior for Bayesian phylogenetic inference

  • Ziheng Yang
  • Biology
    Philosophical Transactions of the Royal Society B: Biological Sciences
  • 2008
The data size-dependent prior alleviates the problem to some extent, giving weaker support for unstable relationships, and may be useful in reducing apparent conflicts in the results of Bayesian analysis or in making the method less sensitive to model violations.

Modeling Gene-Tree-Species-Tree Conflict with Migration Using Continuous-Time Markov Chains

A novel algorithm is presented for constructing the generator matrix of the Markov chain, which can be used to calculate the probability of gene-tree-species-tree conflict and explore regions of parameter space under which conflict is more probable and thus erroneous estimates of phylogeny by heuristic estimators are more probable.

Calculating bootstrap probabilities of phylogeny using multilocus sequence data.

It is found that concatenation of the multilocus sequence data may result in incorrect phylogeny estimation with an extremely high bootstrap probability (BP), which is due to incorrect estimation of the distances and intentional ignorance of the intergene variations.

The asymptotic behavior of bootstrap support values in molecular phylogenetics

This work considers phylogenetic reconstruction as a problem of statistical model selection when the compared models are nonnested and misspecified, and finds the bootstrap is found to have qualitatively different dynamics from Bayesian inference.

PAML 4: phylogenetic analysis by maximum likelihood.

PAML, currently in version 4, is a package of programs for phylogenetic analyses of DNA and protein sequences using maximum likelihood (ML), which can be used to estimate parameters in models of sequence evolution and to test interesting biological hypotheses.



Statistical properties of bootstrap estimation of phylogenetic variability from nucleotide sequences. I. Four taxa with a molecular clock.

Bootstrapping is a conservative approach for estimating the reliability of an inferred phylogeny for four taxa by using model trees of three taxa with an outgroup and by assuming a constant rate of nucleotide substitution.

Evaluation of several methods for estimating phylogenetic trees when substitution rates differ over nucleotide sites

  • Ziheng Yang
  • Environmental Science
    Journal of Molecular Evolution
  • 2004
Several maximum likelihood and distance matrix methods for estimating phylogenetic trees from homologous DNA sequences were compared when substitution rates at sites were assumed to follow a gamma distribution and suggested that the joint likelihood analysis is found to be more robust when rate variation over sites is present but ignored and an assumption is thus violated.

Maximum Likelihood and Minimum-Steps Methods for Estimating Evolutionary Trees from Data on Discrete Characters

The application of maximum likelihood methods to discrete characters is examined, and it is shown that parsimony methods are notmaximum likelihood methods under the assumptions made by Farris, and an algorithm which enables rapid calculation of the likelihood of a phylogeny is described.

Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea

A new method for estimating the variance of the difference between log likelihood of different tree topologies is developed by expressing it explicitly in order to evaluate the maximum likelihood branching order among Hominoidea.


The parameter space of the phylogenetic tree estimation problem consists of three com? ponents, T, t, and 8. The tree topology T is a discrete entity that is not a proper statistical parameter but

Phylogenetic analysis using parsimony and likelihood methods

Evidence was presented showing that the Felsenstein approach does not share the asymptotic efficiency of the maximum likelihood estimator of a statistical parameter, and its performance relative to that of the likelihood method was especially noted.

Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods

Two approximate methods are proposed for maximum likelihood phylogenetic estimation, which allow variable rates of substitution across nucleotide sites, and one of them uses several categories of rates to approximate the gamma distribution, with equal probability for each category.

Robustness of maximum likelihood tree estimation against different patterns of base substitutions

The results are in accordance with those from the simulation study, showing that Jukes and Cantor's model is as useful as a more complicated one for making inferences about molecular phylogeny of the viruses.


From the elucidation of implicit models underlying traditional "par- simony" and "compatibility" analyses, it is seen that Poisson process analysis gives a statistically consistent estimate of phylogeny, and that parsimony methods do indeed have a maximum likelihood foundation but give potentially incorrect estimates of phylogenies.

Success of maximum likelihood phylogeny inference in the four-taxon case.

Although both models were inconsistent for some branch-length combinations in the presence of site-to-site variation, the models were efficient predictors of topology under most simulation conditions.