Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms.

@article{Pearson1991SearchingPS,
  title={Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms.},
  author={William R. Pearson},
  journal={Genomics},
  year={1991},
  volume={11 3},
  pages={
          635-50
        }
}
  • W. Pearson
  • Published 1 November 1991
  • Biology
  • Genomics
Sensitivity and selectivity in protein similarity searches: a comparison of Smith-Waterman in hardware to BLAST and FASTA.
TLDR
It is demonstrated here that the Smith-Waterman (S-W) dynamic programming method and the optimized version of FASTA are significantly better able to distinguish true similarities from statistical noise than is the popular database search tool BLAST.
SALSA: improved protein database searching by a new algorithm for assembly of sequence fragments into gapped alignments
TLDR
A new algorithm has been devised for the computation of a gapped alignment of two sequences using dynamic programming to build an accurate alignment based on the fragments initially identified.
Comparison of methods for searching protein sequence databases
  • W. Pearson
  • Computer Science
    Protein science : a publication of the Protein Society
  • 1995
TLDR
Search sensitivity with either the Smith‐Waterman algorithm or FASTA is significantly improved by using modern scoring matrices, such as BLOSUM45–55, and optimized gap penalties instead of the conventional PAM250 matrix.
A sequence property approach to searching protein databases.
TLDR
This work shows that members of structural protein families have a low mutual PropSearch distance when the weights are optimized to discriminate maximally between structural families, and demonstrates the results of database searches using the PropSearch method.
Comparing algorithms for large-scale sequence analysis
TLDR
This paper ported both Smith-Waterman and BLAST to the Frontier platform, enabling the efficient use of these algorithms on large sequence databases and presents a novel visualization tool along with quantitative metrics for comparing the results of alternative sequence alignment algorithms.
A structure-based method for protein sequence alignment
MOTIVATION With the continuing rapid growth of protein sequence data, protein sequence comparison methods have become the most widely used tools of bioinformatics. Among these methods are those that
Alignment algorithms revisited: Alignment algorithms for low similarity protein sequence comparisons
  • M. Wise
  • Computer Science
    2003 European Control Conference (ECC)
  • 2003
TLDR
This study re-examines the efficacy of local versus global alignment algorithms and finds the Smith-Waterman algorithm is found to be most effective when two proteins have a common domain or have the same function.
Testing statistical significance scores of sequence comparison methods with structure similarity
TLDR
Two out of three Smith-Waterman implementations with e-value are better at predicting structural similarities between proteins than the Smith- waterman implementation with Z-score, and the compute intensive Z- score does not have a clear advantage over the e- value.
Increased Coverage Obtained by Combination of Methods for Protein Sequence Database Searching
TLDR
The union of results by BLAST (p-value) and FASTA at an equal p-value cutoff gave significantly better coverage than either method individually, and the best overall performance was obtained from the intersection of the results from SSEARCH and the GSRCH62 global alignment method.
FASTA Search Programs
TLDR
The FASTA programs provide flexible and rigorous alternatives to BLAST for protein, translated-DNA and DNA alignment and use external annotations to modify aligned sequences and to partition similarity scores.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 24 REFERENCES
Improved tools for biological sequence comparison.
  • W. Pearson, D. Lipman
  • Biology, Computer Science
    Proceedings of the National Academy of Sciences of the United States of America
  • 1988
TLDR
Three computer programs for comparisons of protein and DNA sequences can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity.
Rapid and sensitive protein similarity searches.
TLDR
An algorithm was developed which facilitates the search for similarities between newly determined amino acid sequences and sequences already available in databases and increases sensitivity by giving high scores to those amino acid replacements which occur frequently in evolution.
Profile analysis: detection of distantly related proteins.
TLDR
Tests with globin and immunoglobulin sequences show that profile analysis can distinguish all members of these families from all other sequences in a database containing 3800 protein sequences.
Improved sensitivity of biological sequence database searches
TLDR
The sensitivity of DNA and protein sequence database searches is increased by allowing similar but non-identical amino acids or nucleotides to match and one can match k-tuples or words instead of matching individual residues in order to speed the search.
Protein database searches for multiple alignments.
  • S. Altschul, D. Lipman
  • Computer Science, Biology
    Proceedings of the National Academy of Sciences of the United States of America
  • 1990
TLDR
An algorithm described here permits the current protein database to be searched for three-sequence alignments in less than 4 min, and has revealed a variety of subtle relationships that pairwise search methods would be unable to detect.
Study of protein sequence comparison metrics on the connection machine CM-2
TLDR
Software tools to do rapid, large-scale protein sequence comparisons on databases of amino acid sequences, using a data parallel computer architecture are developed, enabling biologists to find relevant similarities much more quickly, and to evaluate many different comparison metrics in a reasonable time.
Automatic generation of primary sequence patterns from sets of related protein sequences.
  • R. F. Smith, T. Smith
  • Computer Science
    Proceedings of the National Academy of Sciences of the United States of America
  • 1990
TLDR
A computer algorithm is developed that can extract the pattern of conserved primary sequence elements common to all members of a homologous protein family through clustering the pairwise similarity scores among a set of related sequences.
...
1
2
3
...