Phylogeny-Aware Gap Placement Prevents Errors in Sequence Alignment and Evolutionary Analysis

@article{Lytynoja2008PhylogenyAwareGP,
  title={Phylogeny-Aware Gap Placement Prevents Errors in Sequence Alignment and Evolutionary Analysis},
  author={Ari L{\"o}ytynoja and Nick Goldman},
  journal={Science},
  year={2008},
  volume={320},
  pages={1632 - 1635}
}
Genetic sequence alignment is the basis of many evolutionary and comparative studies, and errors in alignments lead to errors in the interpretation of evolutionary information in genomes. Traditional multiple sequence alignment methods disregard the phylogenetic implications of gap patterns that they create and infer systematically biased alignments with excess deletions and substitutions, too few insertions, and implausible insertion-deletion–event histories. We present a method that prevents… Expand
Phylogeny-aware alignment with PRANK.
  • A. Löytynoja
  • Computer Science, Medicine
  • Methods in molecular biology
  • 2014
TLDR
The phylogeny-aware alignment algorithm implemented in the program PRANK has been shown to produce good alignments for evolutionary inferences and can be sensitive to errors in the guide phylogeny and violations on the underlying assumptions about the origin and patterns of gaps. Expand
Phylogeny-Aware Alignment with PRANK and PAGAN.
  • A. Löytynoja
  • Medicine, Computer Science
  • Methods in molecular biology
  • 2021
TLDR
To mitigate the effects of model violations, the phylogeny-aware alignment algorithm has been re-implemented in program PAGAN, which can model and accumulate evidence from more complex gap structures than PRANK does, and incorporate this uncertainty in the inferred ancestral sequences. Expand
Alignment methods: strategies, challenges, benchmarking, and comparative overview.
  • A. Löytynoja
  • Computer Science, Medicine
  • Methods in molecular biology
  • 2012
TLDR
The inter-dependency of alignment and phylogeny can be resolved by joint estimation of the two; methods based on statistical models allow for inferring the alignment parameters from the data and correctly take into account the uncertainty of the solution but remain computationally challenging. Expand
The deterministic effects of alignment bias in phylogenetic inference
TLDR
Simultaneous alignment using the similarity criterion should be applied whenever possible for sequence‐based phylogenetic analyses, rather than relying upon methods that integrate alignment and tree search into a single step without accounting for alignment uncertainty. Expand
Alignment-Integrated Reconstruction of Ancestral Sequences Improves Accuracy
TLDR
It is shown that integrating alignment uncertainty improves ASR accuracy and the accuracy of downstream structural and functional inferences, often performing as well as highly-accurate structure-guided alignment. Expand
Alignment-Integrated Reconstruction of Ancestral Sequences Improves Accuracy
TLDR
An alignment-integrated ASR approach that combines information from many different sequence alignments is introduced that improves ASR accuracy and the accuracy of downstream structural and functional inferences, often performing as well as highly accurate structure-guided alignment. Expand
Phylogenetic assessment of alignments reveals neglected tree signal in gaps
TLDR
This study provides the broad community relying on sequence alignment with important practical recommendations, sets superior standards for assessing alignment accuracy, and paves the way for the development of phylogenetic inference methods of significantly higher resolution. Expand
The Impact of Multiple Protein Sequence Alignment on Phylogenetic Estimation
TLDR
It is observed that phylogenetic accuracy is most highly correlated with alignment accuracy when sequences are most difficult to align, and that variation in alignment accuracy can have little impact on phylogenetics accuracy when alignment error rates are generally low. Expand
Alignment and Mapping
TLDR
Alignments represent hypotheses of positional homologies between nucleotides or amino acids of sequences, which can be used to define scoring functions to compare the quality of any two pairwise alignments. Expand
The Cumulative Indel Model: fast and accurate statistical evolutionary alignment.
TLDR
The cumulative indel model and adaptive banding can improve the performance of alignment and phylogenetic methods and lead to fast and accurate pairwise alignment inference. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 23 REFERENCES
An algorithm for progressive multiple alignment of sequences with insertions.
  • A. Löytynoja, N. Goldman
  • Medicine, Computer Science
  • Proceedings of the National Academy of Sciences of the United States of America
  • 2005
TLDR
This work describes a modification of the traditional alignment algorithm that can distinguish insertion from deletion and avoid repeated penalization of insertions and illustrates this method with a pair hidden Markov model that uses an evolutionary scoring function. Expand
Multiple sequence alignment accuracy and evolutionary distance estimation
TLDR
The bias in distance estimation appears to be a direct result of the standard greedy progressive algorithm used by many multiple alignment methods, which has implications for choosing new taxa and genomes to sequence when resources are limited. Expand
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.
TLDR
The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved and modifications are incorporated into a new program, CLUSTAL W, which is freely available. Expand
Alignment Uncertainty and Genomic Analysis
TLDR
Using genomic data from seven yeast species, it is shown that uncertainty in the alignment can lead to several problems, including different alignment methods resulting in different conclusions. Expand
Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures
TLDR
This work uses the genomes of 12 Drosophila species for the de novo discovery of functional elements in the fly, and identifies several classes of pre- and post-transcriptional regulatory motifs, and predicts individual motif instances with high confidence. Expand
T-Coffee: A novel method for fast and accurate multiple sequence alignment.
TLDR
A new method for multiple sequence alignment that provides a dramatic improvement in accuracy with a modest sacrifice in speed as compared to the most commonly used alternatives but avoids the most serious pitfalls caused by the greedy nature of this algorithm. Expand
MAFFT version 5: improvement in accuracy of multiple sequence alignment
TLDR
Improvement in accuracy was generally observed for most methods, but remarkably large for the new options of MAFFT proposed here, which showed higher accuracy than currently available methods including TCoffee version 2 and CLUSTAL W in benchmark tests consisting of alignments of >50 sequences. Expand
Increased taxon sampling is advantageous for phylogenetic inference.
TLDR
A recent paper on the subject of taxon addition (Rosenberg and Kumar, 2001) concludes that increased taxon sampling is of little benefit to phylogenetic inference when compared to increasing sequence length, but reanalysis of the paper's simulated data indicates that increasedTaxon Sampling is highly beneficial for phylogenetics inference. Expand
Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project
TLDR
Functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project are reported, providing convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts. Expand
MUSCLE: a multiple sequence alignment method with reduced time and space complexity
  • R. Edgar
  • Computer Science, Medicine
  • BMC Bioinformatics
  • 2004
TLDR
MUSCLE offers a range of options that provide improved speed and / or alignment accuracy compared with currently available programs, and a new option, MUSCLE-fast, designed for high-throughput applications. Expand
...
1
2
3
...