A Review of Approaches for Optimizing Phylogenetic Likelihood Calculations

@article{Stamatakis2019ARO,
  title={A Review of Approaches for Optimizing Phylogenetic Likelihood Calculations},
  author={Alexandros Stamatakis},
  journal={Bioinformatics and Phylogenetics},
  year={2019}
}
  • A. Stamatakis
  • Published 2019
  • Biology
  • Bioinformatics and Phylogenetics
The execution times of likelihood-based phylogenetic inference tools for Maximum Likelihood or Bayesian inference are dominated by the Phylogenetic Likelihood Function (PLF). The PLF is executed millions of times in such analyses and accounts for 85–95% of overall run time. In addition, storing the Conditional Likelihood Vectors (CLVs) required for computing the Phylogenetic Likelihood Function largely determines the associated memory consumption. Storing CLVs accounts for approximately 80% of… 

Recent progress on methods for estimating and updating large phylogenies

TLDR
New methods have been developed that aim to enable highly accurate phylogeny estimations on these large datasets, including divide-and-conquer techniques for multiple sequence alignment and/or tree estimation, methods that can estimate species trees from multi-locus datasets while addressing heterogeneity due to biological processes.

High-Performance Phylogenetic Inference

TLDR
Recent research on parallelization and performance optimization of state-of-the-art tree inference tools is surveyed, outlining advances in shared-memory multicore parallelization, optimizations for efficient GPU execution, as well as large-scale distributed-memory parallelization.

Scalable Phylogeny Reconstruction with Disaggregated Near-memory Processing

TLDR
Two near-memory processing models are described, one that addresses the problem of workload distribution to memory bricks, which is particularly tailored toward larger genomes and one that reduces overall memory requirements through memory-side data interpolation transparently to the application, thereby allowing the phylogeny size to scale to a larger number of organisms without requiring additional memory.

Morphological Characters Can Strongly Influence Early Animal Relationships Inferred from Phylogenomic Data Sets

TLDR
This work uses published phylogenomic data sets and a plethora of common methods, that is, likelihood models and their “equivalent” under parsimony: character weighting schemes, to quantify how increased taxon sampling can help stabilize phylogenetic inferences.

References

SHOWING 1-10 OF 41 REFERENCES

Efficient Detection of Repeating Sites to Accelerate Phylogenetic Likelihood Calculations

TLDR
This work presents a fast, novel method for identifying and omitting redundant operations in phylogenetic likelihood calculations, and assess the performance improvement and memory saving attained by the method.

Trading Running Time for Memory in Phylogenetic Likelihood Computations

TLDR
It is demonstrated that, for a phylogeny with species onlylog(n)+ 2 memory space is required for computing the likelihood, which is a promising result given the exponential growth of molecular datasets.

Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees

TLDR
A new search strategy is developed that can reduce the time required for tree inferences by more than 50% while yielding equally good trees for well-chosen starting trees and issues pertaining to the numerical stability of the Γ model of rate heterogeneity on very large trees are addressed.

Computing the Phylogenetic Likelihood Function Out-of-Core

TLDR
It is found that RAM miss rates are below 10%, even if only 5% of the required data structures are held in RAM, and the proof-of-concept implementation runs more than 5 times faster than the respective standard implementation when paging is used.

The Phylogenetic Likelihood Library

TLDR
The Phylogenetic Likelihood Library is introduced, a highly optimized application programming interface for developing likelihood-based phylogenetic inference and postanalysis software that improves the sequential performance of current software by a factor of 2–10 while requiring only 1 month of programming time for integration.

Column sorting: rapid calculation of the phylogenetic likelihood function.

TLDR
An algorithm for exploiting this speed improvement via an application of graph theory to provide faster likelihood algorithms, which will allow likelihood methods to be applied to larger sets of taxa and to include more thorough searches of the tree topology space.

Load Balance in the Phylogenetic Likelihood Kernel

TLDR
These problems are described for the first time, the implications on the design of "classic" ML-based as well as Bayesian search algorithms are discussed, and an initial solution is provided that yields up to eight-fold improvements in speedup values on AMD Barcelona and Sun x4600 16-core systems for realistic application scenarios.

Multiple maxima of likelihood in phylogenetic trees: an analytic approach

TLDR
A new approach to calculating ML directly is reported, which is used to find large families of sequences that have multiple optima, including sequences with a continuum of optimal points, and implies that hill climbing techniques cannot guarantee to find the global ML point, even if it is unique.

Optimization strategies for fast detection of positive selection on phylogenetic trees

TLDR
Novel optimization techniques are introduced that substantially outperform both CodeML from the PAML package and the previously optimized sequential version SlimCodeML for more efficient estimation of the likelihood function on large-scale phylogenetic problems.

BEAGLE: An Application Programming Interface and High-Performance Computing Library for Statistical Phylogenetics

TLDR
BEAGLE, an application programming interface (API) and library for high-performance statistical phylogenetic inference, is presented, which provides a uniform interface for performing phylogenetic likelihood calculations on a variety of compute hardware platforms.