Corpus ID: 56152183

Distribution of Aligned Letter Pairs in Optimal Alignments of Random Sequences

@article{Hauser2012DistributionOA,
  title={Distribution of Aligned Letter Pairs in Optimal Alignments of Random Sequences},
  author={R. Hauser and Heinrich Matzinger},
  journal={arXiv: Probability},
  year={2012}
}
Considering the optimal alignment of two i.i.d. random sequences of length $n$, we show that when the scoring function is chosen randomly, almost surely the empirical distribution of aligned letter pairs in all optimal alignments converges to a unique limiting distribution as $n$ tends to infinity. This result is interesting because it helps understanding the microscopic path structure of a special type of last passage percolation problem with correlated weights, an area of long-standing open… Expand

Figures from this paper

An Upper Bound on the Convergence Rate of a Second Functional in Optimal Sequence Alignment
Consider finite sequences $X_{[1,n]}=X_1\dots X_n$ and $Y_{[1,n]}=Y_1\dots Y_n$ of length $n$, consisting of i.i.d.\ samples of random letters from a finite alphabet, and let $S$ and $T$ be chosenExpand

References

SHOWING 1-10 OF 22 REFERENCES
A Monte Carlo Approach to the Fluctuation Problem in Optimal Alignments of Random Strings
The problem of determining the correct order of fluctuation of the optimal alignment score of two random strings of length $n$ has been open for several decades. It is known that the biased expectedExpand
Estimating statistical significance of sequence alignments.
  • M. Waterman
  • Mathematics, Medicine
  • Philosophical transactions of the Royal Society of London. Series B, Biological sciences
  • 1994
TLDR
The Azuma-Hoeffding inequality gives an upper bound on the probability of large deviations of the score from its mean in the linear case and Poisson approximation can be applied in the logarithmic case. Expand
Detecting the homology of DNA-sequences based on the variety of optimal alignments: a case study
TLDR
The method is based on the observation that for many scor-ing schemes, especially for the longest common subsequence (LCS) scoring, the optimal alignments are the more different the unrelated are the sequences, which gives the idea that the distance between extremal alignments could provide important information about the similarity of the sequences. Expand
Approximation to the mean curve in the LCS problem
The problem of sequence comparison via optimal alignments occurs naturally in many areas of applications. The simplest such technique is based on evaluating a score given by the length of a longestExpand
Extensive simulations for longest common subsequences
Abstract:Given two strings X and Y of N and M characters respectively, the Longest Common Subsequence (LCS) Problem asks for the longest sequence of (non-contiguous) matches between X and Y. UsingExpand
Longest common subsequences of two random sequences.
Given two random k-ary sequences of length n, what is f(n,k), the expected length of their longest common subsequence? This problem arises in the study of molecular evolution. We calculate f(n,k) forExpand
The Rate of Convergence of the Mean Length of the Longest Common Subsequence
Given two i.i.d. sequences of n letters from a finite alphabet, one can consider the length Ln of the longest sequence which is a subsequence of both the given sequences. It is known that ELn growsExpand
Extensive simulations for longest common subsequences . Finite size scaling, a cavity solution, and configuration space properties
Given two strings X and Y of N and M characters respectively, the Longest Common Subsequence (LCS) Problem asks for the longest sequence of (non-contiguous) matches between X and Y. Using extensiveExpand
Standard deviation of the longest common subsequence
Let L n be the length of the longest common subsequence of two independent i.i.d. sequences of Bernoulli variables of length n. We prove that the order of the standard deviation of L n is Vn,Expand
Large deviation based upper bounds for the LCS-problem
We analyse and apply a large deviation and Montecarlo simulation based method for the computation of improved upper bounds on the Chvatal-Sankoff constant for i.i.d. random sequences over a finiteExpand
...
1
2
3
...