A protein alignment scoring system sensitive at all evolutionary distances

@article{Altschul2004APA,
  title={A protein alignment scoring system sensitive at all evolutionary distances},
  author={Stephen F. Altschul},
  journal={Journal of Molecular Evolution},
  year={2004},
  volume={36},
  pages={290-300}
}
  • S. Altschul
  • Published 1 March 1993
  • Biology, Computer Science
  • Journal of Molecular Evolution
SummaryProtein sequence alignments generally are constructed with the aid of a “substitution matrix” that specifies a score for aligning each pair of amino acids. Assuming a simple random protein model, it can be shown that any such matrix, when used for evaluating variable-length local alignments, is implicitly a “log-odds” matrix, with a specific probability distribution for amino acid pairs to which it is uniquely tailored. Given a model of protein evolution from which such distributions may… 
Sequence Alignment with an Appropriate Substitution Matrix
TLDR
An algorithm for selecting an appropriate substitution matrix at given gap penalties for computing an optimal local alignment between two sequences is described and a substitution matrix that leads to the maximum alignment similarity score is selected among substitution matrices at various evolutionary distances.
On the significance of sequence alignments when using multiple scoring matrices
TLDR
The multiple testing problem that occurs when several scoring matrices for local sequence alignment are used is studied and a simple Bonferroni correction of the p-values is considered and investigated to investigate its accuracy.
Protein database searches using compositionally adjusted substitution matrices
TLDR
This work has recently developed a general procedure for transforming a standard matrix into one appropriate for the comparison of two sequences with arbitrary, and possibly differing compositions.
Estimating statistical significance of local protein profile-profile alignments
TLDR
It is shown that improvements in statistical accuracy and sensitivity and high-quality alignment rate result from statistically characterizing alignments by establishing the dependence of statistical parameters on various measures associated with both individual and pairwise profile characteristics.
Parameterizing sequence alignment with an explicit evolutionary model
TLDR
This work identifies and implements several probabilistic evolutionary models compatible with the affine-cost insertion/deletion model used in standard pairwise sequence alignment, including one evolutionary model compatible with symmetric pair HMMs that are the basis for Smith-Waterman pairwise alignment, and two evolutionary modelscompatible with standard profile-based alignment.
Dynamic use of multiple parameter sets in sequence alignment
TLDR
An alignment algorithm to allow dynamic use of multiple parameter sets with different levels of stringency in computation of an optimal alignment of two sequences is described.
The compositional adjustment of amino acid substitution matrices
TLDR
Composition-specific substitution matrix adjustment is shown to be of utility for comparing compositionally biased proteins, including those of organisms with nucleotide-biased, and therefore codon- biased, genomes or isochores.
Statistics of local multiple alignments
TLDR
This work presents and justifies a significance score for multiple segments of a local multiple alignment and demonstrates its usefulness in distinguishing high and moderate quality multiple alignments from low quality ones, with supporting experiments on orthologous vertebrate promoter sequences.
Pairwise alignment incorporating dipeptide covariation
TLDR
The analysis indicates that local correlations between substitutions are not strong on the average, and incorporating local substitution correlations into pairwise alignment did not lead to a statistically significant improvement in remote homology detection.
Statistical evaluation of pairwise protein sequence comparison with the Bayesian bootstrap
TLDR
An unbiased statistical evaluation based on the Bayesian bootstrap, a resampling method operationally similar to the standard bootstrap is developed, showing that the underlying structure within benchmark databases causes Efron's standard, non-parametric bootstrap to be biased.
...
...

References

SHOWING 1-10 OF 63 REFERENCES
Amino acid substitution matrices from an information theoretic perspective
Aligning amino acid sequences: Comparison of commonly used methods
SummaryWe examined two extensive families of protein sequences using four different alignment schemes that employ various degrees of “weighting” in order to determine which approach is most sensitive
Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes.
  • S. Karlin, S. Altschul
  • Biology
    Proceedings of the National Academy of Sciences of the United States of America
  • 1990
TLDR
Using an appropriate random model, this work presents a theory that provides precise numerical formulas for assessing the statistical significance of any region with high aggregate score and examples are given of applications to a variety of protein sequences, highlighting segments with unusual biological features.
Detecting homology of distantly related proteins with consensus sequences.
  • L. Patthy
  • Biology
    Journal of molecular biology
  • 1987
22 A Model of Evolutionary Change in Proteins
TLDR
The body of data used in this study includes 1,572 changes of closely related proteins appearing in the Atlas volumes through Supplement 2 and the mutation data were accumulated from the phylo-genetic trees and from a few pairs of related sequences.
Improved tools for biological sequence comparison.
  • W. Pearson, D. Lipman
  • Biology, Computer Science
    Proceedings of the National Academy of Sciences of the United States of America
  • 1988
TLDR
Three computer programs for comparisons of protein and DNA sequences can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity.
Profile analysis: detection of distantly related proteins.
TLDR
Tests with globin and immunoglobulin sequences show that profile analysis can distinguish all members of these families from all other sequences in a database containing 3800 protein sequences.
The rapid generation of mutation data matrices from protein sequences
TLDR
An efficient means for generating mutation data matrices from large numbers of protein sequences is presented, by means of an approximate peptide-based sequence comparison algorithm, which is fast enough to process the entire SWISS-PROT databank in 20 h on a Sun SPARCstation 1, and is fastenough to generate a matrix from a specific family or class of proteins in minutes.
...
...