# A Probabilistic Model of Local Sequence Alignment That Simplifies Statistical Significance Estimation

@article{Eddy2008APM, title={A Probabilistic Model of Local Sequence Alignment That Simplifies Statistical Significance Estimation}, author={Sean R. Eddy}, journal={PLoS Computational Biology}, year={2008}, volume={4} }

Sequence database searches require accurate estimation of the statistical significance of scores. Optimal local sequence alignment scores follow Gumbel distributions, but determining an important parameter of the distribution (λ) requires time-consuming computational simulation. Moreover, optimal alignment scores are less powerful than probabilistic scores that integrate over alignment uncertainty (“Forward” scores), but the expected distribution of Forward scores remains unknown. Here, I…

## 287 Citations

Significance of Gapped Sequence Alignments

- Biology, Computer ScienceJ. Comput. Biol.
- 2008

This work draws random samples directly from a well chosen, importance-sampling probability distribution to approximate alignment score significance, and shows that the extreme value significance statistic for the local alignment model that is examined does not follow a Gumbel distribution.

Accurate statistics for local sequence alignment with position-dependent scoring by rare-event sampling

- Computer ScienceBMC Bioinformatics
- 2010

An efficient and general method to compute the score distribution to any desired accuracy, combining Markov chain Monte Carlo simulations with importance sampling and generalized ensembles, and extended to a model of transmembrane proteins.

A new generation of homology search tools based on probabilistic inference.

- Computer ScienceGenome informatics. International Conference on Genome Informatics
- 2009

The aim in HMMER3 is to achieve BLAST's speed while further improving the power of probabilistic inference based methods, which aims to usher in a new generation of more powerful homology search tools based on probabilism inference methods.

How sequence alignment scores correspond to probability models

- Biology, Computer SciencebioRxiv
- 2019

This study shows how multiple models correspond to one set of scores and clarifies the statistical basis of sequence alignment, which involves judging whether whole sequences are related versus finding related parts.

Estimating statistical significance of local protein profile-profile alignments

- BiologybioRxiv
- 2018

It is shown that improvements in statistical accuracy and sensitivity and high-quality alignment rate result from statistically characterizing alignments by establishing the dependence of statistical parameters on various measures associated with both individual and pairwise profile characteristics.

Island method for estimating the statistical significance of profile-profile alignment scores

- BiologyBMC Bioinformatics
- 2008

The island statistics can be generalized to profile-profile alignments to provide an efficient method for the alignment score normalization and has a clear speed advantage over the direct shuffling method for comparable accuracy in parameter estimates.

Accelerated Profile HMM Searches

- Computer SciencePLoS Comput. Biol.
- 2011

An acceleration heuristic for profile HMMs, the “multiple segment Viterbi” (MSV) algorithm, which computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment.

Parameterizing sequence alignment with an explicit evolutionary model

- BiologyBMC Bioinformatics
- 2015

This work identifies and implements several probabilistic evolutionary models compatible with the affine-cost insertion/deletion model used in standard pairwise sequence alignment, including one evolutionary model compatible with symmetric pair HMMs that are the basis for Smith-Waterman pairwise alignment, and two evolutionary modelscompatible with standard profile-based alignment.

Remote homology search with hidden Potts models

- Computer Science, BiologybioRxiv
- 2020

A hidden Potts model (HPM) is developed that merges a Potts emission process to a generative probability model of insertion and deletion so they can be applied to sequence alignment and remote homology search using a new model that is based on importance sampling.

Where Does the Alignment Score Distribution Shape Come from?

- BiologyEvolutionary bioinformatics online
- 2010

A novel score probability distribution is obtained which is qualitatively very similar to that of Karlin-Altschul but performing better than all other previous model.

## References

SHOWING 1-10 OF 60 REFERENCES

Statistical Significance of Probabilistic Sequence Alignment and Related Local Hidden Markov Models

- Computer ScienceJ. Comput. Biol.
- 2001

The sensitivity of the hybrid method in the detection of sequence homology is found to be comparable to that of the Smith-Waterman alignment and significantly better than the Viterbi version of the probabilistic alignment.

Statistical significance and extremal ensemble of gapped local hybrid alignment

- Computer Science
- 2002

A “semi-probabilistic” alignment algorithm which combines ideas from Smith-Waterman and probabilistic alignment is proposed and studied in detail. It is predicted that the score statistics of this…

Calibrating E-values for hidden Markov models using reverse-sequence null models

- Computer ScienceBioinform.
- 2005

It is found that using a reverse-sequence null model effectively removes biases owing to sequence length and composition and reduces the number of false positives in a database search.

The estimation of statistical parameters for local alignment score distributions.

- BiologyNucleic acids research
- 2001

This work describes a form of the recently described 'island' method in detail, and uses it to investigate the functional dependence of these parameters on finite-length edge effects.

Scoring hidden Markov models

- Computer ScienceComput. Appl. Biosci.
- 1997

Among the null model choices, a simple looping null model that emits characters according to the geometric mean of the character probabilities in the columns modeled by the hidden Markov model (HMM) performs well or best across all four discrimination experiments.

Hybrid alignment: high-performance with universal statistics

- Computer ScienceBioinform.
- 2002

Preliminary results using the PfamA database suggest that the hybrid algorithm achieves similar performance as existing methods for position-specific scoring systems as well, and is established as a high performance alignment algorithm with well-characterized, universal statistics.

BALSA: Bayesian algorithm for local sequence alignment.

- Computer ScienceNucleic acids research
- 2002

A Bayesian algorithm for local sequence alignment (BALSA), that takes into account the uncertainty associated with all unknown variables by incorporating in its forward sums a series of scoring matrices, gap parameters and all possible alignments.

Accurate formula for P-values of gapped local sequence and profile alignments.

- Computer ScienceJournal of molecular biology
- 2000

A simple general approximation for the distribution of gapped local alignment scores is presented, suitable for assessing significance of comparisons between two protein sequences or a sequence and a profile, and investigates factors which effect the accuracy of alignment statistics.

Rapid Assessment of Extremal Statistics for Gapped Local Alignment

- Computer ScienceISMB
- 1999

By identifying a complete set of linked clusters, "islands," this work devise a method which accurately predicts the extremal score statistics by using only one to a few pairwise alignments, and relies crucially on the link between the statistics of island scores and extremalscore statistics.

Rapid significance estimation in local sequence alignment with gaps

- Computer ScienceRECOMB
- 2001

A new algorithmic approach is presented which allows to estimate the more important of the Gumbel parameters at least five times faster than the traditional methods, and brings significance estimation into the realm of interactive applications.