Pairwise Statistical Significance of Local Sequence Alignment Using Sequence-Specific and Position-Specific Substitution Matrices

@article{Agrawal2011PairwiseSS,
  title={Pairwise Statistical Significance of Local Sequence Alignment Using Sequence-Specific and Position-Specific Substitution Matrices},
  author={Ankit Agrawal and Xiaoqiu Huang},
  journal={IEEE/ACM Transactions on Computational Biology and Bioinformatics},
  year={2011},
  volume={8},
  pages={194-205}
}
Pairwise sequence alignment is a central problem in bioinformatics, which forms the basis of various other applications. Two related sequences are expected to have a high alignment score, but relatedness is usually judged by statistical significance rather than by alignment score. Recently, it was shown that pairwise statistical significance gives promising results as an alternative to database statistical significance for getting individual significance estimates of pairwise alignment scores… 

Figures from this paper

Sequence-specific sequence comparison using pairwise statistical significance.
TLDR
This chapter presents a summary of recent advances in accurately estimating statistical significance of pairwise local alignment for the purpose of identifying related sequences, by making the sequence comparison process more sequence specific.
Derived distribution points heuristic for fast pairwise statistical significance estimation
TLDR
A simple heuristic is proposed, called the Derived Distribution Points (DDP) heuristic, which is designed taking into account the features of the pairwise statistical significance estimation procedure, and has shown to significantly improve the quality of pairwise statistics significance estimates (evaluated in terms of retrieval accuracy) even when using low values of N.
MPIPairwiseStatSig: parallel pairwise statistical significance estimation of local sequence alignment
TLDR
This paper presents a parallel algorithm for pairwise statistical significance estimation, called MPIPairwiseStatSig, implemented in C using MPI, and Distributing the most compute-intensive portions of the pairwise statistics significance estimation procedure across multiple processors has been shown to result in near-linear speed-ups for the application.
Par-PSSE: Software for Pairwise statistical significance estimation in parallel for local sequence alignment
TLDR
This paper presents a software library for estimating pairwise statistical significance in parallel, named Par-PSSE, implemented in C++ using OpenMP, MPI paradigms and their hybrids, and applies the parallelization technique to estimate non-conservative PSS using standard, sequence-specific, and position-specific substitution matrices.
Protein sequence alignment with family-specific amino acid similarity matrices
TLDR
The results of this work indicate that using family-specific similarity matrices significantly improves the quality of the alignment of homologous sequences over the traditional sequence alignment based on a single general-purpose similarity matrix.
FPGA architecture for pairwise statistical significance estimation
TLDR
This work develops algorithms for sequence-specific strategies for hardware acceleration of pairwise sequence alignment in conjunction with statistical significance estimation, and provides a 'flexible array' hardware architecture which provides a scalable systolic array suitable for both long and short sequences.
Parallel pairwise statistical significance estimation of local sequence alignment using Message Passing Interface library
TLDR
This paper presents a parallel algorithm for pairwise statistical significance estimation, called MPIPairwiseStatSig, implemented in C using MPI library, and applies the parallelization technique to estimate non‐conservative pair wise statistical significance using standard, sequence‐specific, and position‐specific substitution matrices.
Enhancing Parallelism of Pairwise Statistical Significance Estimation for Local Sequence Alignment
TLDR
This paper evaluates the use of OpenMP, MPI and hybrid paradigms to accelerate the estimation of PSS of local sequence alignment and achieves a speedup of up to 113.10× using 128 cores.
PR2ALIGN: a stand-alone software program and a web-server for protein sequence alignment using weighted biochemical properties of amino acids
TLDR
PR2ALIGN will be helpful for researchers who wish to align amino acid sequences by using flexible user-specified alignment scoring functions based on the biochemical properties of amino acids instead of the amino acid substitution matrix.
Chapter 3 Mining Genomic Sequence Data for Related Sequences Using Pairwise Statistical Significance
TLDR
This chapter presents the algorithm of pairwise statistical significance, then describes several high performance algorithms, which enable significant acceleration of pair wise statistical significance estimation.
...
...

References

SHOWING 1-10 OF 51 REFERENCES
Pairwise Statistical Significance of Local Sequence Alignment Using Substitution Matrices with Sequence-Pair-Specific Distance
TLDR
It is shown that pairwise statistical significance using rate matrices with sequence-pair-specific distanced substitution matrices performs significantly better compared to using a fixed distance.
Pairwise statistical significance of local sequence alignment using multiple parameter sets and empirical justification of parameter set change penalty
TLDR
The fact that the homology detection performance does not degrade when using multiple parameter sets is a strong evidence for the validity of the assumption that the alignment score distribution follows an extreme value distribution even when using several parameter sets.
Conservative, Non-conservative and Average Pairwise Statistical Significance of Local Sequence Alignment
TLDR
Experimental results for homology detection reveal that the proposed measures give at least comparable or significantly better retrieval accuracy than original pairwise statistical significance and database statistical significance reported by BLAST, PSI-BLAST, and SSEARCH.
Pairwise statistical significance and empirical determination of effective gap opening penalties for protein local sequence alignment
TLDR
Results indicate that using pairwise statistical significance using standard substitution matrices is significantly better than database statistical significance reported by BLAST and PSI-BLAST, and that it is comparable and at times significant better than SSEARCH.
Statistical significance in biological sequence analysis
TLDR
This review discusses the general role of P-value estimation in sequence analysis, and gives a description of theoretical methods and computational approaches to the estimation of statistical signifiance for important classes of sequence analysis problems.
Statistical Significance in Biological Sequence Comparison
TLDR
The chapter reviews the role of statistical significance estimates in biological sequence comparison, focusing on local similarity searches, and it is shown that, with the exception of highly biased protein sequences and sequences with low-complexity regions, real, unrelated protein sequences behave very similarly to sequences generated randomly.
Making Sense of Score Statistics for Sequence Alignments
TLDR
This paper aims to highlight a few of the principles that should be kept in mind when evaluating the statistical significance of alignments between sequences, and shows that the alignment statistics can undergo an abrupt phase transition.
Empirical statistical estimates for sequence similarity searches.
The FASTA package of sequence comparison programs has been modified to provide accurate statistical estimates for local sequence similarity scores with gaps. These estimates are derived using the
Rapid and accurate estimates of statistical significance for sequence data base searches.
  • M. Waterman, M. Vingron
  • Computer Science
    Proceedings of the National Academy of Sciences of the United States of America
  • 1994
TLDR
This work presents a practical method to approximate the probability that a local alignment score is a result of chance alone, and presents applications to data base searching and the analysis of pairwise and self-comparisons of proteins.
Toward an accurate statistics of gapped alignments
...
...