A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood.

@article{Guindon2003ASF,
  title={A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood.},
  author={St{\'e}phane Guindon and Olivier Gascuel},
  journal={Systematic biology},
  year={2003},
  volume={52 5},
  pages={
          696-704
        }
}
The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximum- likelihood principle, which clearly satisfies these requirements. The core of this method is a simple hill-climbing algorithm that adjusts tree topology and branch lengths simultaneously. This algorithm starts from an initial tree built by a fast distance-based method… 

Figures and Tables from this paper

RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees

This paper presents the latest release of the program RAxML-III for rapid maximum likelihood-based inference of large evolutionary trees which allows for computation of 1.000-taxon trees in less than 24 hours on a single PC processor.

Evaluating the performance of a successive-approximations approach to parameter optimization in maximum-likelihood phylogeny estimation.

The results demonstrate that successive approximation is reliable and provide reassurance that this much faster approach is safe to use for ML estimation of topology, as long as the heuristic searches of tree space are rigorous.

A fast program for maximum likelihood-based inference of large phylogenetic trees

A novel, partially randomized algorithm and new parsimony-based rearrangement heuristics are implemented in a sequential and parallel program called RAxML, which shows run time improvements > 25% over parallel fastDNAml yielding exactly the same results.

FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments

Improvements to FastTree are described that improve its accuracy without sacrificing scalability, and FastTree 2 allows the inference of maximum-likelihood phylogenies for huge alignments.

IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies

It is shown that a combination of hill-climbing approaches and a stochastic perturbation method can be time-efficiently implemented and found higher likelihoods between 62.2% and 87.1% of the studied alignments, thus efficiently exploring the tree-space.

Calculating the evolutionary rates of different genes: a fast, accurate estimator with applications to maximum likelihood phylogenetic analysis.

The DistR method was applied to a fungal mitochondrial data set, and the rate estimates compared well to those obtained using existing ML and Bayesian approaches, and bootstrap support for the ML topology was significantly greater when protein rates were used.

RAxML and FastTree: Comparing Two Methods for Large-Scale Maximum Likelihood Phylogeny Estimation

This study shows that very large phylogenies can be estimated very quickly using FastTree, with little (and in some cases no) degradation in tree accuracy, as compared to RAxML.

An efficient program for phylogenetic inference using simulated annealing

  • A. Stamatakis
  • Biology
    19th IEEE International Parallel and Distributed Processing Symposium
  • 2005
A new program RAxML-SA (Randomized Axelerated Maximum Likelihood with Simulated-Annealing) is presented that combines simulated annealing and hill-climbing techniques to improve the quality of final trees.

On a New Quartet-Based Phylogeny Reconstruction Algorithm

It is common that certain incorrect trees can have likelihood values at least as large as that of the correct tree, suggesting that even if the authors are able to find a truly globally optimal tree under the maximum likelihood criterion, this tree may not necessarily be the correct phylogenetic tree.

Estimating phylogenies under maximum likelihood : A very large-scale neighborhood approach

This work adapts Very Large-Scale Neighborhood techniques to estimate phylogenies of large datasets of nucleotide sequences under the maximum likelihood criterion, and shows that the use of the VLSN techniques speeds up convergence to topological local optima, and increases the overall performances of stochastic-based search algorithms.
...

References

SHOWING 1-10 OF 59 REFERENCES

NJML: a hybrid algorithm for the neighbor-joining and maximum-likelihood methods.

  • S. OtaW. Li
  • Biology
    Molecular biology and evolution
  • 2000
A "divide-and-conquer" heuristic algorithm in which an initial neighbor-joining (NJ) tree is divided into subtrees at internal branches having bootstrap values higher than a threshold, which is suitable for reconstructing relatively large molecular phylogenetic trees.

Stochastic search strategy for estimation of maximum likelihood phylogenetic trees.

A stochastic search strategy for estimation of the ML tree that is based on a simulated annealing algorithm that is less likely to become trapped in local optima than are existing algorithms for ML tree estimation.

Improvement of distance-based phylogenetic methods by a local maximum likelihood approach using triplets.

A new approach to estimate the evolutionary distance between two sequences using a tree with three leaves, which improves the precision of evolutionary distance estimates, and thus the topological accuracy of distance-based methods.

Multiple maxima of likelihood in phylogenetic trees: an analytic approach

A new approach to calculating ML directly is reported, which is used to find large families of sequences that have multiple optima, including sequences with a continuum of optimal points, and implies that hill climbing techniques cannot guarantee to find the global ML point, even if it is unique.

A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates.

Parsimony and compatibility had particular difficulty with inaccuracy and bias when substitution rates varied among different branches, and maximum likelihood was the most successful method overall, although for short sequences Fitch-Margoliash and neighbor joining were sometimes better.

A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data.

  • P. Lewis
  • Biology
    Molecular biology and evolution
  • 1998
The genetic algorithm described here required only 6% of the computational effort required by a conventional heuristic search using tree bisection/reconnection (TBR) branch swapping to obtain the same maximum-likelihood topology.

Maximum Likelihood Estimation on Large Phylogenies and Analysis of Adaptive Evolution in Human Influenza Virus A

Methods for obtaining approximate estimates of branch lengths for codon models are explored and the estimates were used to test for positive selection and to identify sites under selection in the viral gene under diversifying Darwinian selection.

Traditional phylogenetic reconstruction methods reconstruct shallow and deep evolutionary relationships equally well.

Simulation is extended to include the ME, MP, and ML methods to examine how these methods perform under Jukes-Cantor (JC) model (Jukes and Cantor 1969) and a more complex Hasegawa, Kishino, and Yano (1985) model of nucleotide substitution.

Quartet-based phylogenetic inference: improvements and limits.

WO is faster and offers better theoretical guarantees than QP, a new algorithm which is also based on weighted 4-trees, and computer simulations indicate that the topological accuracy of WO is less dependent on the shape of the correct tree.

The metapopulation genetic algorithm: An efficient solution for the problem of large phylogeny estimation

  • A. LemmonM. Milinkovitch
  • Computer Science, Biology
    Proceedings of the National Academy of Sciences of the United States of America
  • 2002
The metapopulation genetic algorithm, involving several populations of trees that are forced to cooperate in the search for the optimal tree, proves to be both very accurate and vastly faster than existing heuristics, such that data sets comprised of hundreds of taxa can be analyzed in practical computing times under complex models of maximum-likelihood evolution.
...