Exact asymptotic results for the Bernoulli matching model of sequence alignment.

@article{Majumdar2005ExactAR,
  title={Exact asymptotic results for the Bernoulli matching model of sequence alignment.},
  author={Satya N. Majumdar and Sergei Nechaev},
  journal={Physical review. E, Statistical, nonlinear, and soft matter physics},
  year={2005},
  volume={72 2 Pt 1},
  pages={
          020901
        }
}
  • S. Majumdar, S. Nechaev
  • Published 11 October 2004
  • Computer Science, Mathematics
  • Physical review. E, Statistical, nonlinear, and soft matter physics
Finding analytically the statistics of the longest common subsequence (LCS) of a pair of random sequences drawn from c alphabets is a challenging problem in computational evolutionary biology. We present exact asymptotic results for the distribution of the LCS in a simpler, yet nontrivial, variant of the original model called the Bernoulli matching (BM) model. We show that in the BM model, for all c , the distribution of the asymptotic length of the LCS, suitably scaled, is identical to the… 

Figures from this paper

Exact solution of the Bernoulli matching model of sequence alignment
Through a series of exact mappings we reinterpret the Bernoulli model of sequence alignment in terms of the discrete-time totally asymmetric exclusion process with backward sequential update and step
Sparse long blocks and the variance of the LCS
Consider two random strings having the same length and generated by two mutually independent iid sequences taking values uniformly in a common finite alphabet. We study the order of the variance of
Bethe Ansatz in the Bernoulli matching model of random sequence alignment.
TLDR
Considering the terracelike representation of the sequence alignment problem, the Bethe Ansatz technique is applied via an exact mapping to the five-vertex model on a square lattice to reproduce the results for the averaged length of the longest common subsequence in the Bernoulli approximation.
Deviation from mean in sequence comparison with a periodic sequence
Let Ln denote the length of the longest common subsequence of two sequences of length n. We draw one of the sequences i.i.d., but the other is non- random and periodic. We prove that VAR(Ln) = ( n).
Bethe Ansatz Solution of the Finite Bernoulli Matching Model of Sequence Alignment
We map the Bernoulli matching model of sequence alignment to the discrete-time totally asymmetric exclusion process with backward sequential update and step function initial condition. The Bethe
On the Order of the Central Moments of the Length of the Longest Common Subsequences in Random Words
We investigate the order of the r-th, 1 ≤ r < +∞, central moment of the length of the longest common subsequences of two independent random words of size n whose letters are identically distributed
A Central Limit Theorem for the Length of the Longest Common Subsequence in Random Words
Let (Xk)k≥1 and (Yk)k≥1 be two independent sequences of independent identically distributed random variables having the same law and taking their values in a finite alphabet. Let LCn be the length of
Large deviations of the top eigenvalue of large Cauchy random matrices
We compute analytically the large deviation tails of the probability density function (pdf) of the top eigenvalue ?max? in rotationally invariant and heavy-tailed Cauchy ensembles of N ? N matrices
A Central Limit Theorem for the Length of the Longest Common Subsequences in Random Words
Let $(X_i)_{i \geq 1}$ and $(Y_i)_{i\geq1}$ be two independent sequences of independent identically distributed random variables taking their values in a common finite alphabet and having the same
A simple derivation of the Tracy-Widom distribution of the maximal eigenvalue of a Gaussian unitary random matrix
In this paper, we first briefly review some recent results on the distribution of the maximal eigenvalue of an (N × N) random matrix drawn from Gaussian ensembles. Next we focus on the Gaussian
...
...

References

SHOWING 1-10 OF 75 REFERENCES
Extensive simulations for longest common subsequences . Finite size scaling, a cavity solution, and configuration space properties
Given two strings X and Y of N and M characters respectively, the Longest Common Subsequence (LCS) Problem asks for the longest sequence of (non-contiguous) matches between X and Y. Using extensive
The Rate of Convergence of the Mean Length of the Longest Common Subsequence
Given two i.i.d. sequences of n letters from a finite alphabet, one can consider the length Ln of the longest sequence which is a subsequence of both the given sequences. It is known that ELn grows
Mean-Field Approximations to the Longest Common Subsequence Problem
TLDR
This work describes a systematic way of incorporating correlations among the matches of two real sequences in the calculation, and obtains closer and closer approximations to the LCS problem.
Longest common subsequences of two random sequences
Given two random k-ary sequences of length n, what is f(n,k), the expected length of their longest common subsequence? This problem arises in the study of molecular evolution. We calculate f(n,k) for
The longest common subsequence problem revisited
This paper re-examines, in a unified framework, two classic approaches to the problem of finding a longest common subsequence (LCS) of two strings, and proposes faster implementations for both. Letl
Long Common Subsequences and the Proximity of two Random Strings.
Let $( x_1 ,x_2 , \cdots x_n )$ and $( x'_1 ,x'_2 , \cdots x'_n , )$ be two strings from an alphabet $mathcal{A}$, and let $L_n $ denote their longest common subsequence. The probabilistic behavior
Alignment of molecular sequences seen as random path analysis.
TLDR
This work focuses on deriving a mathematically rigorous solution to RPA both in its combinatorial form and in its graphical representation, which puts DP in logical perspective under a more general conceptual framework.
Biological sequence analysis
TLDR
This talk will review a little over a decade's research on applying certain stochastic models to biological sequence analysis, and introduce the motif models in stages, beginning from very simple, non-stochastic versions, progressively becoming more complex, until they reach modern profile HMMs for motifs.
...
...