Comparison of DNA sequences with protein sequences.

@article{Pearson1997ComparisonOD,
  title={Comparison of DNA sequences with protein sequences.},
  author={William R. Pearson and Todd Charles Wood and Z Zhang and Webb Miller},
  journal={Genomics},
  year={1997},
  volume={46 1},
  pages={
          24-36
        }
}
The FASTA package of sequence comparison programs has been expanded to include FASTX and FASTY, which compare a DNA sequence to a protein sequence database, translating the DNA sequence in three frames and aligning the translated DNA sequence to each sequence in the protein database, allowing gaps and frameshifts. Also new are TFASTX and TFASTY, which compare a protein sequence to a DNA sequence database, translating each sequence in the DNA database in six frames and scoring alignments with… 

Figures and Tables from this paper

FASTA Search Programs
TLDR
The FASTA programs provide flexible and rigorous alternatives to BLAST for protein, translated-DNA and DNA alignment and use external annotations to modify aligned sequences and to partition similarity scores.
BLAST and FASTA similarity searching for multiple sequence alignment.
  • W. Pearson
  • Biology
    Methods in molecular biology
  • 2014
TLDR
Both BLAST and FASTA provide very accurate statistical estimates, which can be used to reliably identify protein sequences that diverged more than 2 billion years ago.
Direct mapping and alignment of protein sequences onto genomic sequence
TLDR
Exon-level and gene-level accuracies of Spaln are significantly higher than those obtained by the best available methods of the same type, particularly when the query and the target are distantly related.
Flexible sequence similarity searching with the FASTA3 program package.
  • W. Pearson
  • Biology
    Methods in molecular biology
  • 2000
The FASTA3 and FASTA2 packages provide a flexible set of sequence-comparison programs that are particularly valuable because of their accurate statistical estimates and high-quality alignments.
Genetack: frameshift Identification in protein-Coding Sequences by the Viterbi Algorithm
TLDR
The program can identify spurious predictions made by a conventional gene-finding program misled by a frameshift, and is favorably compared with the accuracy of the FSFind-BLAST program that uses protein database search to verify predicted frameshifts, even though the program does not use external evidence.
Homology-based gene structure prediction: simplified matching algorithm using a translated codon (tron) and improved accuracy by allowing for long gaps
TLDR
A new convention for encoding a DNA sequence into a series of 23 possible letters (translated codon or tron code) was devised to improve this type of analysis and a dynamic programming algorithm was developed to align a sequence and a protein sequence or profile so that the spliced and translated sequence optimally matches the reference the same as the standard protein sequence alignment allowing for long gaps.
Frameshift alignment: statistics and post-genomic applications
TLDR
A method to estimate statistical significance of frameshift alignments, similar to classic BLAST statistics is described, suggesting that metagenomic analysis needs to use frameshIFT alignment to derive accurate results.
FramePlus: aligning DNA to protein sequences
TLDR
A new algorithm, FramePlus, for DNA-protein sequence alignment was found to be somewhat better than other algorithms in the presence of moderate and high rates of frameshift errors, and comparable to Translated Search in the absence of sequencing errors.
Finding Protein and Nucleotide Similarities with FASTA
  • W. Pearson
  • Computer Science
    Current protocols in bioinformatics
  • 2016
TLDR
These protocols describe how to use the FASTA programs to characterize protein and DNA sequences, using protein:protein, protein:DNA, and DNA:DNA comparisons.
Getting More from Less
TLDR
Two novel sequence similarity search algorithms, FASTS and FASTF, that use multiple short peptide sequences to identify homologous sequences in protein or DNA databases are described, allowing proteomic identification from organisms whose genomes have not been sequenced.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 49 REFERENCES
Alignments of DNA and protein sequences containing frameshift errors
TLDR
A new algorithm is presented which can detect and correct frameshift errors in DNA sequences during comparison of translated sequences with protein sequences in the databases and performs significantly better than any previously reported method.
Improved tools for biological sequence comparison.
  • W. Pearson, D. Lipman
  • Biology, Computer Science
    Proceedings of the National Academy of Sciences of the United States of America
  • 1988
TLDR
Three computer programs for comparisons of protein and DNA sequences can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity.
Rapid and sensitive sequence comparison with FASTP and FASTA.
Aligning a DNA sequence with a protein sequence
TLDR
Algorithm for computing optimal alignments for several definitions of DNA-protein alignment, verify sufficient conditions for equivalence of certain definitions, describe techniques for efficient implementation, and discuss experience with these ideas in a new release of the FASTA suite of database-searching programs.
Aligning a DNA Sequence with a Protein Sequence
TLDR
Algorithm for computing optimal alignments for several definitions of DNA-protein alignment, verify sufficient conditions for equivalence of certain definitions, describe techniques for efficient implementation, and discuss experience with these ideas in a new release of the FASTA suite of database-searching programs.
Methods for comparing a DNA sequence with a protein sequence
We describe two methods for constructing an optimal global alignment of, and an optimal local alignment between, a DNA sequence and a protein sequence. The alignment model of the methods addresses
Effective protein sequence comparison.
Identification of protein coding regions by database similarity search
TLDR
The computer program BLASTX performed conceptual translation of a nucleotide query sequence followed by a protein database search in one programmatic step and was characterized as appropriate for use in moderate and large scale sequencing projects at the earliest opportunity, when the data are most prone to containing errors.
Aligning two sequences within a specified diagonal band
TLDR
An algorithm for aligning two sequences within a diagonal band that requires only O(NW) computation time and O(N) space is described, which allows longer sequences to be aligned and allows optimization within wider bands, which can include longer gaps.
Comparison of methods for searching protein sequence databases
  • W. Pearson
  • Computer Science
    Protein science : a publication of the Protein Society
  • 1995
TLDR
Search sensitivity with either the Smith‐Waterman algorithm or FASTA is significantly improved by using modern scoring matrices, such as BLOSUM45–55, and optimized gap penalties instead of the conventional PAM250 matrix.
...
1
2
3
4
5
...