Rapid and Accurate Large-Scale Coestimation of Sequence Alignments and Phylogenetic Trees

@article{Liu2009RapidAA,
  title={Rapid and Accurate Large-Scale Coestimation of Sequence Alignments and Phylogenetic Trees},
  author={Kevin Liu and Sindhu Raghavan and Serita M. Nelesen and C. Randal Linder and Tandy J. Warnow},
  journal={Science},
  year={2009},
  volume={324},
  pages={1561 - 1564}
}
Rapid Tree Building Phylogenetic reconstruction is used to determine the relationships between organisms and requires an accurate alignment and analysis of multiple sequences. Iterative rounds of alignment and tree building are often necessary to prevent errors in the phylogeny estimate. One such way to address this problem is to assess alignment and trees in a single step. However, efficient algorithms to analyze data sets of reasonable size have been lacking. Liu et al. (p. 1561; see the… Expand
Co-estimation of Phylogeny-aware Alignment and Phylogenetic Tree
TLDR
Canopy is developed, a new tool for parallelized iterative search of optimal alignment that demonstrates that, for all experimental settings tested, Canopy produces the most accurate sequence alignments and that the inferred phylogenetic trees are of comparable accuracy to those obtained with the leading alternative method, SATé. Expand
Evaluating Sequence Alignments and Phylogenies New Methods and Large-Scale Comparisons
Phylogenetic trees are one of the most important representations of the evolutionary relationship between genomic sequences. Alternatively, their relatedness can be expressed by a matrix of pairwiseExpand
DendroBLAST: Approximate Phylogenetic Trees in the Absence of Multiple Sequence Alignments
TLDR
It is demonstrated that approximate phylogenetic trees can be produced in the absence of multiple sequence alignments, and it is proposed that these trees will provide a platform for improving and informing downstream bioinformatic analysis. Expand
Alignment methods: strategies, challenges, benchmarking, and comparative overview.
  • A. Löytynoja
  • Computer Science, Medicine
  • Methods in molecular biology
  • 2012
TLDR
The inter-dependency of alignment and phylogeny can be resolved by joint estimation of the two; methods based on statistical models allow for inferring the alignment parameters from the data and correctly take into account the uncertainty of the solution but remain computationally challenging. Expand
Multiple sequence alignment: a major challenge to large-scale phylogenetics
TLDR
It is shown that as the number of sequences increases, thenumber of alignment methods that can analyze the datasets decreases, and the most accurate alignment methods are unable to analyze the very largest datasets, so that only moderately accurate aligned methods can be used on the largest datasets. Expand
SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees.
TLDR
A modification to the original SATé algorithm that improves upon SATé (which is now call SATé-I) in terms of speed and of phylogenetic and alignment accuracy, and presents two lines of evidence that using ML with gaps treated as missing data to optimize the alignment and tree produces very poor results. Expand
Fast Algorithms for Large-Scale Phylogenetic Reconstruction
TLDR
Three novel fast phylogenetic algorithms are developed and LSHTree, the first sub-quadratic time algorithm with theoretical performance guarantees under a Markov model of sequence evolution, is applied to the problem of placing large numbers of short sequence reads onto a fixed phylogenetic tree. Expand
Uniting Alignments and Trees
TLDR
A new approach for the coestimation of phylogenetic trees and sequence alignments for very large data sets is described, which should be seen as two parts of one question: the detection of sequence homology. Expand
Generalized Bootstrap Supports for Phylogenetic Analyses of Protein Sequences Incorporating Alignment Uncertainty
TLDR
Unistrap, a novel approach that estimates the combined effect of alignment uncertainty and site sampling on phylogenetic tree branch supports, provides branch support estimates that take into account a larger fraction of the parameters impacting tree instability when processing datasets containing a large number of sequences. Expand
Multiple sequence alignment: a major challenge to large-scale
TLDR
The estimation of highly accurate multiple sequence alignments is a major challenge for Tree of Life projects, and more generally for large-scale systematics studies, because the most accurate alignment methods are unable to analyze the very largest datasets the authors studied, so that only moderately accurate aligned methods can be used on the largest datasets. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 53 REFERENCES
Bayesian coestimation of phylogeny and sequence alignment
TLDR
A fully Bayesian Markov chain Monte Carlo method for coestimating phylogeny and sequence alignment, under the Thorne-Kishino-Felsenstein model of substitution and single nucleotide insertion-deletion (indel) events, and indicates that the patterns in reliability broadly correspond to structural features of the proteins, and thus provides biologically meaningful information which is not existent in the usual point-estimate of the alignment. Expand
Simultaneous statistical multiple alignment and phylogeny reconstruction.
TLDR
This paper presents and discusses a strategy based on simulated annealing, which makes use of the TKF2 model to infer a phylogenetic tree for a set of DNA or protein sequences together with the sequences'indel history, i.e., their multiple alignment augmented with information about the positioning of insertion and deletion events in the tree. Expand
Joint Bayesian estimation of alignment and phylogeny.
TLDR
The indel model makes use of affine gap penalties and considers indels of multiple letters, and makes the simplifying assumption that the indel process is identical on all branches, so the probability of a gap is independent of branch length. Expand
Multiple sequence alignment accuracy and phylogenetic inference.
TLDR
Simulation of sequences containing insertion and deletion events was performed to determine the role that alignment accuracy plays during phylogenetic inference, and results indicated that as alignment error increases, topological accuracy decreases. Expand
A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood.
TLDR
This work has used extensive and realistic computer simulations to show that the topological accuracy of this new method is at least as high as that of the existing maximum-likelihood programs and much higher than the performance of distance-based and parsimony approaches. Expand
Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments.
TLDR
Whether phylogenetic reconstruction improves after alignment cleaning or not is examined and cleaned alignments produce better topologies although, paradoxically, with lower bootstrap, which indicates that divergent and problematic alignment regions may lead, when present, to apparently better supported although, in fact, more biased topologies. Expand
SATCHMO: Sequence Alignment and Tree Construction Using Hidden Markov Models
TLDR
Results using SATCHMO to identify protein domains are demonstrated on potassium channels, with implications for the mechanism by which tumor necrosis factor alpha affects potassium current. Expand
Rose: generating sequence families
TLDR
A new probabilistic model of the evolution of RNA-, DNA-, or protein-like sequences and a software tool, Rose, that implements this model, suitable for the evaluation of methods in multiple sequence alignment computation and the prediction of phylogenetic relationships is presented. Expand
Theoretical foundation of the balanced minimum evolution method of phylogenetic inference and its relationship to weighted least-squares tree fitting.
TLDR
It is proved that the BME principle is a special case of the weighted least-squares approach, with biologically meaningful variances of the distance estimates, and it is demonstrated that FASTME only produces trees with positive branch lengths, a feature that separates this approach from NJ (and related methods) that may produce trees with branches with biologically meaningless negative lengths. Expand
An algorithm for progressive multiple alignment of sequences with insertions.
  • A. Löytynoja, N. Goldman
  • Medicine, Computer Science
  • Proceedings of the National Academy of Sciences of the United States of America
  • 2005
TLDR
This work describes a modification of the traditional alignment algorithm that can distinguish insertion from deletion and avoid repeated penalization of insertions and illustrates this method with a pair hidden Markov model that uses an evolutionary scoring function. Expand
...
1
2
3
4
5
...