Improved bounds on the average length of longest common subsequences

@inproceedings{Lueker2009ImprovedBO,
  title={Improved bounds on the average length of longest common subsequences},
  author={George S. Lueker},
  booktitle={JACM},
  year={2009}
}
  • G. S. Lueker
  • Published in JACM 12 January 2003
  • Computer Science
It has long been known [Chvátal and Sankoff 1975] that the average length of the longest common subsequence of two random strings of length <i>n</i> over an alphabet of size <i>k</i> is asymptotic to γ<sub><i>k</i></sub><i>n</i> for some constant γ<sub><i>k</i></sub> depending on <i>k</i>. The value of these constants remains unknown, and a number of papers have proved upper and lower bounds on them. We discuss techniques, involving numerical calculations with recurrences on many variables, for… 

Figures from this paper

On the Convergence of Upper Bound Techniques for the Average Length of Longest Common Subsequences
TLDR
It is shown that for arbitrary k, a sufficient condition for a parameterized method to produce a sequence of upper bounds approaching the true value of γk is met, and that a generalization of the method of [6] meets this condition for all k ≥ 2.
THE LENGTH OF THE LONGEST COMMON SUBSEQUENCE OF TWO INDEPENDENT MALLOWS PERMUTATIONS By
TLDR
This paper focuses on the case when the strings are generated uniformly at random from a given alphabet, and the expected length of the LCS of two random k-ary sequences of length n when normalized by n converges to a constant γk.
An Improved Bound on the Fraction of Correctable Deletions
TLDR
The largest fraction of correctable deletions for LaTeX codes is pinned down, an upper bound even for the simpler model of erasures where the locations of the missing symbols are known.
Systematic assessment of the expected length, variance and distribution of Longest Common Subsequences
TLDR
This work systematically analyze the expected length, variance and distribution of LCS based on extensive Monte Carlo simulation and the results on expected length are consistent with currently proved theoretical results, and the analysis on variance and distributions provide further insights into the problem.
On the Variance of the Length of the Longest Common Subsequences in Random Words With an Omitted Letter
TLDR
The order of the variance of the length of the longest common subsequences of two independent random words of size $n$ is shown to be linear in $n$.
Length of the Longest Common Subsequence between Overlapping Words
TLDR
It is proved that the expected length of an LCS is approximately $max(\ell, \mathbb{E}[L_n])$, where $L_ n$ is the length of a LCS between two independent random sequences.
Longest common subsequences between words of very unequal length
TLDR
It is shown that the expected length of the longest common subsequence between two random words of lengths $n$ and $(1-\varepsilon)kn$ over $k$-symbol alphabet is of the order $1-c\varpsilon^2$ uniformly in $ k$ and $\vARpsilon$.
Covering Codes Using Insertions or Deletions
TLDR
Their upper bounds have an optimal dependence on the word length, and the authors achieve asymptotic density matching the best known bounds for Hamming distance covering codes.
Multivariate Fine-Grained Complexity of Longest Common Subsequence
TLDR
A systematic study of the multivariate complexity of LCS, taking into account all parameters previously discussed in the literature, and determining the optimal running time for LCS under SETH as $(n+\min\{d, \delta \Delta,\delta m\})^{1\pm o(1)}".
Sparse Long Blocks and the Micro-structure of the Longuest Common Subsequences
TLDR
It is shown that for sufficiently long strings the optimal alignment (OA) corresponding to a longest common subsequence (LCS) treats the inserted block very differently depending on the size of the alphabet.
...
...

References

SHOWING 1-10 OF 32 REFERENCES
Expected length of longest common subsequences
TLDR
The methods used for producing bounds on the expected length of a common subsequences of two sequences are also used for other problems, namely a longest common subsequence of several sequences, a shortest common supersequence and a maximal adaptability.
The Rate of Convergence of the Mean Length of the Longest Common Subsequence
Given two i.i.d. sequences of n letters from a finite alphabet, one can consider the length Ln of the longest sequence which is a subsequence of both the given sequences. It is known that ELn grows
The longest common subsequence problem revisited
This paper re-examines, in a unified framework, two classic approaches to the problem of finding a longest common subsequence (LCS) of two strings, and proposes faster implementations for both. Letl
Expected Length of the Longest Common Subsequence for Large Alphabets
TLDR
It is proved that a conjecture of Sankoff and Mainville from the early 80’s claiming that \(\gamma_{\kappa}\sqrt{k}\longrightarrow 2\) as \(K \long rightarrow \infty\).
Bounding the Expected Length of Longest Common Subsequences and Forests
Abstract. We present improvements to two techniques to find lower and upper bounds for the expected length of longest common subsequences and forests of two random sequences of the same length, over
Longest common subsequences of two random sequences
Given two random k-ary sequences of length n, what is f(n,k), the expected length of their longest common subsequence? This problem arises in the study of molecular evolution. We calculate f(n,k) for
Algorithms for the Longest Common Subsequence Problem
TLDR
A lgor i thm is appl icable in the genera l case and requi res O ( p n + n log n) t ime for any input strings o f lengths m and n even though the lower bound on T ime of O ( m n ) need not apply to all inputs.
On a Speculated Relation Between Chvátal–Sankoff Constants of Several Sequences
TLDR
It is proven that, when normalized by n, the expected length of a longest common subsequence of d sequences of length n over an alphabet of size σ converges to a constant γσ,d and obtained some new lower bounds for γμ,d, when both σ and d are small integers.
On the Approximation of Shortest Common Supersequences and Longest Common Subsequences
Finding shortest common supersequences (SCS) and longest common subsequences (LCS) for a given set of sequences are two well-known NP-hard problems. They have important applications in many areas
Longest Common Subsequences
TLDR
Some of the combinatorial properties of the sub- and super-sequence relations are explored, various algorithms for computing the LLCS are surveyed, and some results on the expected LLCS for pairs of random strings are introduced.
...
...