Evolutionary Distances in the Twilight Zone—A Rational Kernel Approach

@article{Schwarz2010EvolutionaryDI,
  title={Evolutionary Distances in the Twilight Zone—A Rational Kernel Approach},
  author={Roland F. Schwarz and William Fletcher and Frank F{\"o}rster and Benjamin Merget and Matthias Wolf and J{\"o}rg Schultz and Florian Markowetz},
  journal={PLoS ONE},
  year={2010},
  volume={5}
}
Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract string comparisons and avoid potential alignment problems. However, in general they are not biologically motivated and ignore our knowledge about the evolution of sequences. Thus, it is still a… 

Figures from this paper

Computational statistics in molecular phylogenetics
TLDR
A new, portable and flexible application, named INDELible, is implemented, which can be used to generate nucleotide, amino acid and codon sequence data by simulating indels (under several models of indel length distribution) as well as substitutions (under a rich repertoire of substitution models).
Graph Theory-Based Sequence Descriptors as Remote Homology Predictors
TLDR
This work first goes over the most popular AF approaches used for detecting homology signals within the twilight zone and then brings out the state-of-the-art tools encoding graph theory-derived sequence/structure descriptors and their success for identifying remote homologs.
A Not-So-Long Introduction to Computational Molecular Evolution.
TLDR
The emergence of the use of likelihood-based methods, review the standard DNA substitution models, and introduce how model choice operates are presented, before showing how state-of-the-art models take inspiration from diffusion theory to link population genetics and molecular evolution.
Next-generation phylogenomics
TLDR
It is argued that next-generation data require next- generation phylogenomics, including so-called alignment-free approaches, as well as other approaches to phylogenetics.
TI2BioP — Topological Indices to BioPolymers. A Graphical– Numerical Approach for Bioinformatics
TLDR
TI2BioP generally outperformed classical bioinformatics algorithms in the functional classification of Bacteriocins, ri‐ bonucleases III, genomic internal transcribed spacer II and ade‐ nylation domains of nonribosomal peptide synthetases (NRPS) allowing the detection of new members in these target gene/protein classes.
ALVIS: interactive non-aggregative visualization and explorative analysis of multiple sequence alignments
TLDR
Alvis is an open-source platform for the joint explorative analysis of MSAs and phylogenetic trees, employing Sequence Bundles as its main visualization method, and combines the power of the visualization method with an interactive toolkit allowing detection of covariant sites, annotation of trees with synapomorphies and homoplasies, and motif detection.
BitPhylogeny: a probabilistic framework for reconstructing intra-tumor phylogenies
TLDR
BitPhylogeny, a probabilistic framework to reconstruct intra-tumor evolutionary pathways from methylation patterns in colon cancer and from single-cell exomes in myeloproliferative neoplasm is presented.
Phylogenetic Quantification of Intra-tumour Heterogeneity
TLDR
MEDICC is presented, a method for phylogenetic reconstruction and heterogeneity quantification based on a Minimum Event Distance for Intra-tumour Copy-number Comparisons that outperforms state-of-the-art competitors in reconstruction accuracy, and additionally allows unbiased numerical quantification of tumour heterogeneity.
MEDICC2: whole-genome doubling aware copy-number phylogenies for cancer evolution
TLDR
MEDICC2 is presented, a new phylogenetic algorithm for allele-specific SCNA data based on a minimum-evolution criterion that explicitly models clonal and subclonal WGD events and that takes parallel evolutionary events into account, and can identify W GD events and quantify SCNA burden in single-sample studies and infer phylogenetic trees and ancestral genomes in multi-sample scenarios.
Hotspots for mutations in the SARS-CoV-2 spike glycoprotein: a correspondence analysis
TLDR
Higher rate of RBD maintenance than furin cleavage site was predicted, and the accumulation of substitutions reinforces the probability of the multi-host circulation of the virus and emphasizes the enduring evolutionary events.
...
...

References

SHOWING 1-10 OF 67 REFERENCES
Pattern-Based Phylogenetic Distance Estimation and Tree Reconstruction
TLDR
An alignment-free method that calculates phylogenetic distances using a maximum-likelihood approach for a model of sequence change on patterns that are discovered in unaligned sequences is developed, which yields distances that show a linear relationship to reference distances over a substantially longer range than other alignment- free methods.
The Average Common Substring Approach to Phylogenomic Reconstruction
TLDR
The core of the method is a new measure of pairwise distances between sequences, based on computing the average lengths of maximum common substrings, which is intrinsically related to information theoretic tools (Kullback-Leibler relative entropy).
Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments.
TLDR
Whether phylogenetic reconstruction improves after alignment cleaning or not is examined and cleaned alignments produce better topologies although, paradoxically, with lower bootstrap, which indicates that divergent and problematic alignment regions may lead, when present, to apparently better supported although, in fact, more biased topologies.
Is Multiple-Sequence Alignment Required for Accurate Inference of Phylogeny?
TLDR
This work mapped the phylogenetic accuracy for many alignment-free methods, among them several recently introduced ones, and increased the understanding of their behavior in response to biologically important parameters, finding the optimal word length k of word-based methods to be stable across various data sets, and providing parameter ranges for two different alphabets.
Probabilistic Phylogenetic Inference with Insertions and Deletions
TLDR
A probabilistic model of a multiple sequence alignment that accounts for insertion and deletion events in addition to substitutions, given a phylogenetic tree, using a rate matrix augmented by the gap character is described.
Indel-based evolutionary distance and mouse-human divergence.
We propose a method for estimating the evolutionary distance between DNA sequences in terms of insertions and deletions (indels), defined as the per site number of indels accumulated in the course of
The Impact of Multiple Protein Sequence Alignment on Phylogenetic Estimation
TLDR
It is observed that phylogenetic accuracy is most highly correlated with alignment accuracy when sequences are most difficult to align, and that variation in alignment accuracy can have little impact on phylogenetics accuracy when alignment error rates are generally low.
A new sequence distance measure for phylogenetic tree construction
TLDR
A new sequence distance measure based on the relative information between the sequences using Lempel-Ziv complexity is proposed, which can be used to construct phylogenetic trees.
Toward Extracting All Phylogenetic Information from Matrices of Evolutionary Distances
TLDR
A statistical analysis of certain distance-based techniques indicates that their data requirement for large evolutionary trees essentially matches the conjectured performance of maximum likelihood methods—challenging the idea that summary statistics lead to suboptimal analyses.
Multiple sequence alignment accuracy and phylogenetic inference.
TLDR
Simulation of sequences containing insertion and deletion events was performed to determine the role that alignment accuracy plays during phylogenetic inference, and results indicated that as alignment error increases, topological accuracy decreases.
...
...