Phylogenetic Tree Construction Using Markov Chain Monte Carlo

  title={Phylogenetic Tree Construction Using Markov Chain Monte Carlo},
  author={Shuying S Li and Dennis K. Pearl and Hani Doss},
  journal={Journal of the American Statistical Association},
  pages={493 - 508}
Abstract We describe a Bayesian method based on Markov chain simulation to study the phylogenetic relationship in a group of DNA sequences. Under simple models of mutational events, our method produces a Markov chain whose stationary distribution is the conditional distribution of the phylogeny given the observed sequences. Our algorithm strikes a reasonable balance between the desire to move globally through the space of phylogenies and the need to make computationally feasible moves in areas… 

Bayesian Phylogenetic Inference via Markov Chain Monte Carlo Methods

A Markov chain to sample from the posterior distribution for a phylogenetic tree given sequence information from the corresponding set of organisms is derived, generating reproducible estimates and credible sets for the path of evolution.

Limitations of Markov chain Monte Carlo algorithms for Bayesian inference of phylogeny

This paper presents the first theoretical work analyzing the rate of convergence of several Markov chains widely used in phylogenetic inference, and proves that many of the popular Markov Chains take exponentially long to reach their stationary distribution.

Markov chain Monte Carlo for the Bayesian analysis of evolutionary trees from aligned molecular sequences

The challenging part is to approximate the posterior, and this is done by constructing a Markov chain having the posterior as its invariant distribution, following the approach of Mau, Newton, and Larget (1998).

Bayesian phylogenetic inference via Monte Carlo methods

The combinatorial sequential Monte Carlo (CSMC) method is proposed to generalize applications of SMC to non-clock tree inference based on the existence of a flexible partially ordered set (poset) structure, and it is presented in a level of generality directly applicable to many other combinatorsial spaces.

Markov chain Monte Carlo and its applications to phylogenetic tree construction

This thesis forms a novel Bayesian model for phylogenetic tree construction based on recent studies that incorporates known information about the evolutionary history of the species, referred to as the species phylogeny, in a statistically rigorous way and develops an inference algorithm based on a Markov chain Monte Carlo method in order to overcome the computational complexity inherent in the problem.

Phylogenetic MCMC Algorithms Are Misleading on Mixtures of Trees

It is proved that the Markov chains take an exponentially long number of iterations to converge to the posterior distribution, which means that in cases of data containing potentially conflicting phylogenetic signals, phylogenetic reconstruction should be performed separately on each signal.

Guided tree topology proposals for Bayesian phylogenetic inference.

This work investigates the performance of common MCMC proposal distributions in terms of median and variance of run time to convergence on 11 data sets, and introduces two new Metropolized Gibbs Samplers for moving through "tree space".

Parallel algorithms for Bayesian phylogenetic inference

Bayesian selection of continuous-time Markov chain evolutionary models.

A reversible jump Markov chain Monte Carlo approach to estimating the posterior distribution of phylogenies based on aligned DNA/RNA sequences under several hierarchical evolutionary models is developed and found that the Kimura model is too restrictive, and the Hasegawa, Kishino, and Yano model can be rejected for some data sets.



Bayesian Phylogenetic Inference via Markov Chain Monte Carlo Methods

A Markov chain to sample from the posterior distribution for a phylogenetic tree given sequence information from the corresponding set of organisms is derived, generating reproducible estimates and credible sets for the path of evolution.

Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo Method.

An improved Bayesian method is presented for estimating phylogenetic trees using DNA sequence data, and the posterior probabilities of phylogenies are used to estimate the maximum posterior probability (MAP) tree, which has a probability of approximately 95%.

Markov Chasin Monte Carlo Algorithms for the Bayesian Analysis of Phylogenetic Trees

We further develop the Bayesian framework for analyzing aligned nucleotide sequence data to reconstruct phylogenies, assess uncertainty in the reconstructions, and perform other statistical

Statistical properties of bootstrap estimation of phylogenetic variability from nucleotide sequences. I. Four taxa with a molecular clock.

Bootstrapping is a conservative approach for estimating the reliability of an inferred phylogeny for four taxa by using model trees of three taxa with an outgroup and by assuming a constant rate of nucleotide substitution.

A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data.

  • P. Lewis
  • Biology
    Molecular biology and evolution
  • 1998
The genetic algorithm described here required only 6% of the computational effort required by a conventional heuristic search using tree bisection/reconnection (TBR) branch swapping to obtain the same maximum-likelihood topology.

Stochastic search strategy for estimation of maximum likelihood phylogenetic trees.

A stochastic search strategy for estimation of the ML tree that is based on a simulated annealing algorithm that is less likely to become trapped in local optima than are existing algorithms for ML tree estimation.

Practical Markov Chain Monte Carlo

The case is made for basing all inference on one long run of the Markov chain and estimating the Monte Carlo error by standard nonparametric methods well-known in the time-series and operations research literature.

Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling.

We present a new way to make a maximum likelihood estimate of the parameter 4N mu (effective population size times mutation rate per site, or theta) based on a population sample of molecular