Longest Common Prefixes with k-Errors and Applications

@inproceedings{Ayad2018LongestCP,
  title={Longest Common Prefixes with k-Errors and Applications},
  author={Lorraine A. K. Ayad and Panagiotis Charalampopoulos and Costas S. Iliopoulos and Solon P. Pissis},
  booktitle={SPIRE},
  year={2018}
}
Although real-world text datasets, such as DNA sequences, are far from being uniformly random, average-case string searching algorithms perform significantly better than worst-case ones in most applications of interest. [] Key Result We show that our technique is applicable to several algorithmic problems in computational biology and elsewhere.

Faster Algorithms for Longest Common Substring

TLDR
An O(n logk−1/2 n)-time algorithm is shown, which stems from a recursive heavy-path decomposition technique that was first introduced in the seminal paper of Cole et al.

Time-Space Tradeoffs for Finding a Long Common Substring

TLDR
A significant speed-up is obtained for instances where the length of the sought LCS is large, based on techniques originating from the LCS with Mismatches problem, on space-efficient locally consistent parsing, and on the structure of maximal repetitions in the input documents.

Longest Property-Preserved Common Factor

TLDR
This paper considers two fundamental string properties: square-free factors and periodic factors under two different settings, one per property and presents linear-time solutions for both settings.

Linear-Time Algorithm for Long LCF with k Mismatches

In the Longest Common Factor with $k$ Mismatches (LCF$_k$) problem, we are given two strings $X$ and $Y$ of total length $n$, and we are asked to find a pair of maximal-length factors, one of $X$ and

Faster Algorithms for 1-Mappability of a Sequence

TLDR
Two new algorithms that require worst-case time and space for integer alphabets of size \(m=\varOmega (\log _\sigma n)\) are presented, thus greatly improving the state of the art.

supporting time-optimal queries with O ( log 2 n ) time for updates

TLDR
The techniques developed can be applied to obtain fully dynamic algorithms for all of the analogously restricted dynamic variants of problems on strings and are applied to computing the solution for a string with a given set of k edits, which leads to answering internal queries on a string.

Longest property-preserved common factor: A new string-processing framework

Dynamic and Internal Longest Common Substring

TLDR
The first solution to the fully dynamic LCS problem requiring sublinear time in n per edit operation is presented, and dynamic sublinear-time algorithms for both the longest palindrome and Lyndon factorization of a string after a single edit operation are developed.

Pattern Masking for Dictionary Matching

TLDR
It is shown, through a reduction from the well-known $k$-Clique problem, that a decision version of the PMDM problem is NP-complete, even for strings over a binary alphabet.

SMART: SuperMaximal approximate repeats tool

TLDR
This talk will present SMART, a tool based on recent algorithmic advances implemented in C++ to compute supermaximal k-mismatch repeats directly and show that the elements SMART outputs are statistically much more significant than the output of the state-of-the-art tools.

References

SHOWING 1-10 OF 41 REFERENCES

Longest Common Prefixes with k-Mismatches and Applications

TLDR
The proposed algorithm for computing the longest prefix of each suffix of a given string of length n over a constant-sized alphabet of size \(\sigma\) that occurs elsewhere in the string with Hamming distance at most k can be directly applied to the problem of genome mappability.

Longest Common Prefix with Mismatches

TLDR
An algorithm is proposed that computes, for each text suffix, the length of its longest prefix that occurs elsewhere in the text with at most one mismatch, and a second algorithm is described and analysed that uses a greedy strategy to reduce the amount of computation.

A note on the longest common substring with k-mismatches problem

  • S. Grabowski
  • Computer Science, Mathematics
    Inf. Process. Lett.
  • 2015

Longest Common Substring with Approximately k Mismatches

TLDR
A conditional lower bound based on the SETH hypothesis implying that there is little hope to improve existing solutions is shown and a strongly subquadratic-time 2-approximation algorithm for the longest common substring with k mismatches problem is obtained and conditional hardness of improving its approximation ratio is shown.

Deterministic Indexing for Packed Strings

TLDR
A new string index is created in the deterministic and packed setting such that given a packed pattern string of length m the authors can support queries in (deterministic) time O(m/a + log m + log log s), where a = w /log s is the number of characters packed in a word of size w = log n.

Algorithmic Framework for Approximate Matching Under Bounded Edits with Applications to Sequence Analysis

TLDR
A novel algorithmic framework for solving approximate sequence matching problems that permit a bounded total number k of mismatches, insertions, and deletions and is expected to be a broadly applicable theoretical tool, and may inspire the design of practical heuristics and software.

A Provably Efficient Algorithm for the k-Mismatch Average Common Substring Problem

TLDR
This article presents the first provably efficient algorithm for the k-mismatch average common string (ACSk) problem that takes O(n) space and O( n log(k) n) time in the worst case for any constant k.

Optimal suffix tree construction with large alphabets

  • M. Farach
  • Computer Science
    Proceedings 38th Annual Symposium on Foundations of Computer Science
  • 1997
TLDR
This work builds suffix trees in linear time for integer alphabet using Weiner's algorithm, which matches a trivial /spl Omega/(n log n)-time lower bound based on sorting.

Suffix arrays: a new method for on-line string searches

TLDR
A new and conceptually simple data structure, called a suffixarray, for on-line string searches is introduced in this paper, and it is believed that suffixarrays will prove to be better in practice than suffixtrees for many applications.

kmacs: the k-mismatch average common substring approach to alignment-free sequence comparison

TLDR
This work describes kmacs, an efficient implementation of this idea based on generalized enhanced suffix arrays, and presents a greedy heuristic to approximate the length of such k-mismatch substrings by considering longest common substrings with k mismatches.