Phylogenies without Branch Bounds: Contracting the Short, Pruning the Deep
@article{Daskalakis2011PhylogeniesWB, title={Phylogenies without Branch Bounds: Contracting the Short, Pruning the Deep}, author={Constantinos Daskalakis and Elchanan Mossel and S{\'e}bastien Roch}, journal={SIAM J. Discret. Math.}, year={2011}, volume={25}, pages={872-893} }
We introduce a new phylogenetic reconstruction algorithm which, unlike most previous rigorous inference techniques, does not rely on assumptions regarding the branch lengths or the depth of the tree. The algorithm returns a forest which is guaranteed to contain all edges that are (1) sufficiently long and (2) sufficiently close to the leaves. How much of the true tree is recovered depends on the sequence length provided. The algorithm is distance-based and runs in polynomial time.
18 Citations
Phylogenies without Branch Bounds: Contracting the Short, Pruning the Deep
- Computer ScienceRECOMB
- 2009
We introduce a new phylogenetic reconstruction algorithm which, unlike most previous rigorous inference techniques, does not rely on assumptions regarding the branch lengths or the depth of the tree.…
Fast Phylogenetic Tree Reconstruction Using Locality-Sensitive Hashing
- Computer ScienceWABI
- 2012
We present the first sub-quadratic time algorithm that with high probability correctly reconstructs phylogenetic trees for short sequences generated by a Markov model of evolution. Due to rapid…
Phylogenetic mixtures: Concentration of measure in the large-tree limit
- BiologyArXiv
- 2011
Using concentration of measure techniques, it is shown that mixtures of large trees are typically identifiable and derive sequence-length requirements for high-probability reconstruction.
Fast Algorithms for Large-Scale Phylogenetic Reconstruction
- Computer Science
- 2013
Three novel fast phylogenetic algorithms are developed and LSHTree, the first sub-quadratic time algorithm with theoretical performance guarantees under a Markov model of sequence evolution, is applied to the problem of placing large numbers of short sequence reads onto a fixed phylogenetic tree.
Towards a Practical O(n logn) Phylogeny Algorithm
- Computer ScienceWABI
- 2011
A variety of extensions are presented which, while only slowing the algorithm down by a constant factor, make its performance nearly comparable to that of neighbour-joining, which requires O(n3) runtime.
Identifiability and inference of non-parametric rates-across-sites models on large-scale phylogenies
- Biology, Computer ScienceJournal of mathematical biology
- 2013
A new approach for estimating general rates-across-sites models, based on a novel algorithm that clusters sites according to their mutation rate, implies, in particular, that large phylogenies are typically identifiable under rate variation.
Coalescent-based species tree estimation: a stochastic Farris transform
- BiologyArXiv
- 2017
This paper proposes an algorithm for phylogeny reconstruction under the multispecies coalescent model with a standard model of site substitution, and obtains a new identifiability result of independent interest: for any species tree with $n \geq 3$ species, the rooted species tree can be identified from the distribution of its unrooted weighted gene trees even in the absence of a molecular clock.
Towards optimal distance functions for stochastic substitution models.
- Computer ScienceJournal of theoretical biology
- 2009
Estimating Optimal Species Trees from Incomplete Gene Trees Under Deep Coalescence
- Biology, Environmental ScienceJ. Comput. Biol.
- 2012
This paper considers the problem of estimating species trees from gene trees and alignments for the general case where the gene trees or alignments can be incomplete, which means that not all the genes contain sequences for all the species.
References
SHOWING 1-10 OF 34 REFERENCES
Maximal Accurate Forests from Distance Matrices
- Computer ScienceRECOMB
- 2006
This work presents a fast converging method for distance-based phylogenetic inference, which is novel in two respects: first, it is the only method to guarantee accuracy when knowledge about the model tree, i.e bounds on the edge lengths, is not assumed; and, with high probability, no false assertions are made.
A short proof that phylogenetic tree reconstruction by maximum likelihood is hard
- BiologyIEEE/ACM Transactions on Computational Biology and Bioinformatics
- 2006
A short proof that computing the maximum likelihood tree is NP-hard by exploiting a connection between likelihood and parsimony observed by Tuffley and Steel.
Disk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction
- BiologyJ. Comput. Biol.
- 1999
A simple method is presented, the Disk-Covering Method (DCM), which boosts the performance of base phylogenetic methods under various Markov models of evolution, and it is proved that for almost all trees, polylogarithmic length sequences suffice for complete accuracy with high probability, while polynomial length sequences always suffice.
A signal-to-noise analysis of phylogeny estimation by neighbor-joining: Insufficiency of polynomial length sequences.
- BiologyMathematical biosciences
- 2006
Fast and reliable reconstruction of phylogenetic trees with very short edges
- Computer ScienceSODA '08
- 2008
This paper presents a fast converging reconstruction algorithm which returns a partially resolved topology containing all edges of the original tree whose weight exceeds some (non-trivial) lower bound, which is determined by the input sequence length, as well as some properties of the tree, such as its depth.
COMPUTATIONAL COMPLEXITY OF INFERRING PHYLOGENIES BY COMPATIBILITY
- Mathematics
- 1986
A well-known approach to inferring phylogenies involves finding a phylogeny with the largest number of characters that are perfectly compatible with it. Variations of this problem depend on whether…
Optimal phylogenetic reconstruction
- Mathematics, Computer ScienceSTOC '06
- 2006
The proof of Steel's conjecture is complete and a reconstruction algorithm using optimal (up to a multiplicative constant) sequence length is given to obtain an optimal reconstruction algorithm for the Jukes-Cantor model with short edges.
Inverting Random Functions II: Explicit Bounds for Discrete Maximum Likelihood Estimation, with Applications
- MathematicsSIAM J. Discret. Math.
- 2002
This paper studies inverting random functions under the maximum likelihood estimation (MLE) criterion in the discrete setting and provides explicit upper and lower bounds for MLE, both in the nonparametric and parametric setting, and gives applications to coin-tossing and phylogenetic tree reconstruction.
Nearly tight bounds on the learnability of evolution
- Computer ScienceProceedings 38th Annual Symposium on Foundations of Computer Science
- 1997
A very simple algorithm, which is a variant on one of the most popular algorithms used by practitioners, converges on the true tree at a rate which differs from the optimum by a constant, and the learnability of each CF tree is sandwiched between two such simpler trees.