# Sampling rare events: statistics of local sequence alignments.

@article{Hartmann2002SamplingRE, title={Sampling rare events: statistics of local sequence alignments.}, author={Alexander K. Hartmann}, journal={Physical review. E, Statistical, nonlinear, and soft matter physics}, year={2002}, volume={65 5 Pt 2}, pages={ 056102 } }

A method to calculate probability distributions in regions where the events are very unlikely (e.g., p approximately 10(-40)) is presented. The basic idea is to map the underlying model on a physical system. The system is simulated at a low temperature, such that preferably configurations with originally low probabilities are generated. Since the distribution of such a physical system is known, the original unbiased distribution can be obtained. As an application, local alignment of protein…

## 60 Citations

Local sequence alignments statistics: deviations from Gumbel statistics in the rare-event tail

- BiologyAlgorithms for Molecular Biology
- 2006

The results show that the statistics of gapped and ungapped local alignments deviates significantly from Gumbel in the rare-event tail, which is usually used when evaluating p-values in databases.

Sequence Alignment Statistics

- Mathematics
- 2010

This chapter gives some simple, useful techniques for approximating the p-values of various types of optimal alignment scores. It starts with general techniques: if, e.g., a dynamic programming…

Accurate statistics for local sequence alignment with position-dependent scoring by rare-event sampling

- Computer ScienceBMC Bioinformatics
- 2010

An efficient and general method to compute the score distribution to any desired accuracy, combining Markov chain Monte Carlo simulations with importance sampling and generalized ensembles, and extended to a model of transmembrane proteins.

Large-Deviation Properties of Sequence Alignment of Correlated Sequences

- BiologyJ. Comput. Biol.
- 2018

The large deviation method that was used in previous studies is applied to local and global alignment of iid drawn sequences and it is shown that again a correction to the Gumbel distribution is necessary to study the dependence of the parameters on the correlation strength.

Significance of Gapped Sequence Alignments

- Biology, Computer ScienceJ. Comput. Biol.
- 2008

This work draws random samples directly from a well chosen, importance-sampling probability distribution to approximate alignment score significance, and shows that the extreme value significance statistic for the local alignment model that is examined does not follow a Gumbel distribution.

New finite-size correction for local alignment score distributions

- MedicineBMC Research Notes
- 2012

An improved finite-size correction is presented that considers the distribution of sequence lengths rather than simply the corresponding means and improves sensitivity and avoids substituting an ad hoc length for short sequences that can underestimate the significance of a match.

Mathematical models, algorithms, and statistics of sequence alignment

- Biology, Computer Science
- 2010

This work presents the basic theory of sequence alignment from computational, biological, and statistical perspectives, and analyzes results of computer simulations that effectively illustrate one possible application of this theory.

Score statistics of global sequence alignment from the energy distribution of a modified directed polymer and directed percolation problem.

- MathematicsPhysical review. E, Statistical, nonlinear, and soft matter physics
- 2005

This work investigates the score statistics of global sequence alignment taking into account, in particular, the compositional bias of the sequences compared, and the possibility of characterizing score statistics for modest system size (sequence lengths), via proper reparametrization of alignment scores, is illustrated.

Estimating statistical significance of local protein profile-profile alignments

- BiologybioRxiv
- 2018

It is shown that improvements in statistical accuracy and sensitivity and high-quality alignment rate result from statistically characterizing alignments by establishing the dependence of statistical parameters on various measures associated with both individual and pairwise profile characteristics.

Minimum-free-energy distribution of RNA secondary structures: Entropic and thermodynamic properties of rare events.

- MathematicsPhysical review. E, Statistical, nonlinear, and soft matter physics
- 2010

Generalized ensemble Markov-chain Monte Carlo methods are used to explore the rare-event tail of the MFE distribution down to probabilities such as 10^{-70} and to study the relationship between the sequence entropy and structural properties for sequence ensembles with fixed MFEs.

## References

SHOWING 1-10 OF 59 REFERENCES

Stochastic simulation

- Computer ScienceWiley series in probability and mathematical statistics : applied probability and statistics
- 1987

Brian D. Ripley's Stochastic Simulation is a short, yet ambitious, survey of modern simulation techniques, and three themes run throughout the book.

Phase transitions and critical phenomena

- PhysicsComputing in Science & Engineering
- 1999

The examination of phase transitions and critical phenomena has dominated statistical physics for the latter half of this century--there is a great theoretical challenge in solving special…

A Modern Course in Statistical Physics

- Physics
- 1981

L E Reichl 1980 Austin: University of Texas Press xii + 709 pp price $29.95 I can thoroughly recommend this book and congratulate the author for having brought together so many aspects of statistical…

Geometric Sums: Bounds for Rare Events with Applications: Risk Analysis, Reliability, Queueing

- Mathematics
- 1997

Preface. Glossary of Notation. 1. Introduction. 2. Miscellaneous Probability Topics. 3. Generalized Renyi Theorem. 4. Two-Sided Bounds. 5. Metric Bounds. 6. Ruin Probability. 7. Reliability…

A modern course in statistical physics

- Physics
- 1980

THERMODYNAMICS. Introduction to Thermodynamics. The Thermodynamics of Phase Transitions. CONCEPTS FROM PROBABILITY THEORY. Elementary Probability Theory and Limit Theorems. Stochastic Dynamics and…

From the U. S. A.

- Mathematics
- 1965

AbMnwt. Let P = (pi 1 i E I} and 8 = {qi 1 i E I} be sets of partis! functions with the same index set Z. We say that Cp is an interpolating function (from P to 0) if @(pi) = qi for each i. We give…

“Bioinformatics” 특집을 내면서

- Business, Medicine
- 2000

Assessment of medical technology in the context of commercialization with Bioentrepreneur course, which addresses many issues unique to biomedical products.

Nucleic Acids Res

- Nucleic Acids Res