Sampling rare events: statistics of local sequence alignments.

@article{Hartmann2002SamplingRE,
  title={Sampling rare events: statistics of local sequence alignments.},
  author={Alexander K. Hartmann},
  journal={Physical review. E, Statistical, nonlinear, and soft matter physics},
  year={2002},
  volume={65 5 Pt 2},
  pages={
          056102
        }
}
  • A. Hartmann
  • Published 13 August 2001
  • Computer Science
  • Physical review. E, Statistical, nonlinear, and soft matter physics
A method to calculate probability distributions in regions where the events are very unlikely (e.g., p approximately 10(-40)) is presented. The basic idea is to map the underlying model on a physical system. The system is simulated at a low temperature, such that preferably configurations with originally low probabilities are generated. Since the distribution of such a physical system is known, the original unbiased distribution can be obtained. As an application, local alignment of protein… 

Figures from this paper

Local sequence alignments statistics: deviations from Gumbel statistics in the rare-event tail
TLDR
The results show that the statistics of gapped and ungapped local alignments deviates significantly from Gumbel in the rare-event tail, which is usually used when evaluating p-values in databases.
Sequence Alignment Statistics
This chapter gives some simple, useful techniques for approximating the p-values of various types of optimal alignment scores. It starts with general techniques: if, e.g., a dynamic programming
Accurate statistics for local sequence alignment with position-dependent scoring by rare-event sampling
TLDR
An efficient and general method to compute the score distribution to any desired accuracy, combining Markov chain Monte Carlo simulations with importance sampling and generalized ensembles, and extended to a model of transmembrane proteins.
Large-Deviation Properties of Sequence Alignment of Correlated Sequences
TLDR
The large deviation method that was used in previous studies is applied to local and global alignment of iid drawn sequences and it is shown that again a correction to the Gumbel distribution is necessary to study the dependence of the parameters on the correlation strength.
Significance of Gapped Sequence Alignments
  • L. Newberg
  • Biology, Computer Science
    J. Comput. Biol.
  • 2008
TLDR
This work draws random samples directly from a well chosen, importance-sampling probability distribution to approximate alignment score significance, and shows that the extreme value significance statistic for the local alignment model that is examined does not follow a Gumbel distribution.
New finite-size correction for local alignment score distributions
TLDR
An improved finite-size correction is presented that considers the distribution of sequence lengths rather than simply the corresponding means and improves sensitivity and avoids substituting an ad hoc length for short sequences that can underestimate the significance of a match.
Mathematical models, algorithms, and statistics of sequence alignment
TLDR
This work presents the basic theory of sequence alignment from computational, biological, and statistical perspectives, and analyzes results of computer simulations that effectively illustrate one possible application of this theory.
Score statistics of global sequence alignment from the energy distribution of a modified directed polymer and directed percolation problem.
TLDR
This work investigates the score statistics of global sequence alignment taking into account, in particular, the compositional bias of the sequences compared, and the possibility of characterizing score statistics for modest system size (sequence lengths), via proper reparametrization of alignment scores, is illustrated.
Estimating statistical significance of local protein profile-profile alignments
TLDR
It is shown that improvements in statistical accuracy and sensitivity and high-quality alignment rate result from statistically characterizing alignments by establishing the dependence of statistical parameters on various measures associated with both individual and pairwise profile characteristics.
Minimum-free-energy distribution of RNA secondary structures: Entropic and thermodynamic properties of rare events.
TLDR
Generalized ensemble Markov-chain Monte Carlo methods are used to explore the rare-event tail of the MFE distribution down to probabilities such as 10^{-70} and to study the relationship between the sequence entropy and structural properties for sequence ensembles with fixed MFEs.
...
...

References

SHOWING 1-10 OF 59 REFERENCES
Stochastic simulation
  • B. Ripley
  • Computer Science
    Wiley series in probability and mathematical statistics : applied probability and statistics
  • 1987
TLDR
Brian D. Ripley's Stochastic Simulation is a short, yet ambitious, survey of modern simulation techniques, and three themes run throughout the book.
Phase transitions and critical phenomena
  • D. Landau
  • Physics
    Computing in Science & Engineering
  • 1999
The examination of phase transitions and critical phenomena has dominated statistical physics for the latter half of this century--there is a great theoretical challenge in solving special
A Modern Course in Statistical Physics
L E Reichl 1980 Austin: University of Texas Press xii + 709 pp price $29.95 I can thoroughly recommend this book and congratulate the author for having brought together so many aspects of statistical
Geometric Sums: Bounds for Rare Events with Applications: Risk Analysis, Reliability, Queueing
Preface. Glossary of Notation. 1. Introduction. 2. Miscellaneous Probability Topics. 3. Generalized Renyi Theorem. 4. Two-Sided Bounds. 5. Metric Bounds. 6. Ruin Probability. 7. Reliability
A modern course in statistical physics
THERMODYNAMICS. Introduction to Thermodynamics. The Thermodynamics of Phase Transitions. CONCEPTS FROM PROBABILITY THEORY. Elementary Probability Theory and Limit Theorems. Stochastic Dynamics and
From the U. S. A.
AbMnwt. Let P = (pi 1 i E I} and 8 = {qi 1 i E I} be sets of partis! functions with the same index set Z. We say that Cp is an interpolating function (from P to 0) if @(pi) = qi for each i. We give
Methods in Enzymology
  • T. Creighton
  • Medicine
    The Yale Journal of Biology and Medicine
  • 1968
“Bioinformatics” 특집을 내면서
TLDR
Assessment of medical technology in the context of commercialization with Bioentrepreneur course, which addresses many issues unique to biomedical products.
Nucleic Acids Res
  • Nucleic Acids Res
...
...