Longest Common Prefixes with k-Errors and Applications

@inproceedings{Ayad2018LongestCP,
  title={Longest Common Prefixes with k-Errors and Applications},
  author={Lorraine A. K. Ayad and Panagiotis Charalampopoulos and Costas S. Iliopoulos and Solon P. Pissis},
  booktitle={SPIRE},
  year={2018}
}
Although real-world text datasets, such as DNA sequences, are far from being uniformly random, average-case string searching algorithms perform significantly better than worst-case ones in most applications of interest. [] Key Result We show that our technique is applicable to several algorithmic problems in computational biology and elsewhere.

Faster Algorithms for Longest Common Substring

An O(n logk−1/2 n)-time algorithm is shown, which stems from a recursive heavy-path decomposition technique that was first introduced in the seminal paper of Cole et al.

Time-Space Tradeoffs for Finding a Long Common Substring

A significant speed-up is obtained for instances where the length of the sought LCS is large, based on techniques originating from the LCS with Mismatches problem, on space-efficient locally consistent parsing, and on the structure of maximal repetitions in the input documents.

Longest Property-Preserved Common Factor

This paper considers two fundamental string properties: square-free factors and periodic factors under two different settings, one per property and presents linear-time solutions for both settings.

Linear-Time Algorithm for Long LCF with k Mismatches

In the Longest Common Factor with $k$ Mismatches (LCF$_k$) problem, we are given two strings $X$ and $Y$ of total length $n$, and we are asked to find a pair of maximal-length factors, one of $X$ and

Faster Algorithms for 1-Mappability of a Sequence

Two new algorithms that require worst-case time and space for integer alphabets of size \(m=\varOmega (\log _\sigma n)\) are presented, thus greatly improving the state of the art.

supporting time-optimal queries with O ( log 2 n ) time for updates

The techniques developed can be applied to obtain fully dynamic algorithms for all of the analogously restricted dynamic variants of problems on strings and are applied to computing the solution for a string with a given set of k edits, which leads to answering internal queries on a string.

Longest property-preserved common factor: A new string-processing framework

Dynamic and Internal Longest Common Substring

The first solution to the fully dynamic LCS problem requiring sublinear time in n per edit operation is presented, and dynamic sublinear-time algorithms for both the longest palindrome and Lyndon factorization of a string after a single edit operation are developed.

Pattern Masking for Dictionary Matching

It is shown, through a reduction from the well-known $k$-Clique problem, that a decision version of the PMDM problem is NP-complete, even for strings over a binary alphabet.

SMART: SuperMaximal approximate repeats tool

This talk will present SMART, a tool based on recent algorithmic advances implemented in C++ to compute supermaximal k-mismatch repeats directly and show that the elements SMART outputs are statistically much more significant than the output of the state-of-the-art tools.

References

SHOWING 1-10 OF 41 REFERENCES

Longest Common Prefixes with k-Mismatches and Applications

The proposed algorithm for computing the longest prefix of each suffix of a given string of length n over a constant-sized alphabet of size \(\sigma\) that occurs elsewhere in the string with Hamming distance at most k can be directly applied to the problem of genome mappability.

Longest Common Prefix with Mismatches

An algorithm is proposed that computes, for each text suffix, the length of its longest prefix that occurs elsewhere in the text with at most one mismatch, and a second algorithm is described and analysed that uses a greedy strategy to reduce the amount of computation.

A note on the longest common substring with k-mismatches problem

  • S. Grabowski
  • Computer Science, Mathematics
    Inf. Process. Lett.
  • 2015

Longest Common Substring with Approximately k Mismatches

A conditional lower bound based on the SETH hypothesis implying that there is little hope to improve existing solutions is shown and a strongly subquadratic-time 2-approximation algorithm for the longest common substring with k mismatches problem is obtained and conditional hardness of improving its approximation ratio is shown.

Deterministic Indexing for Packed Strings

A new string index is created in the deterministic and packed setting such that given a packed pattern string of length m the authors can support queries in (deterministic) time O(m/a + log m + log log s), where a = w /log s is the number of characters packed in a word of size w = log n.

Algorithmic Framework for Approximate Matching Under Bounded Edits with Applications to Sequence Analysis

A novel algorithmic framework for solving approximate sequence matching problems that permit a bounded total number k of mismatches, insertions, and deletions and is expected to be a broadly applicable theoretical tool, and may inspire the design of practical heuristics and software.

A Provably Efficient Algorithm for the k-Mismatch Average Common Substring Problem

This article presents the first provably efficient algorithm for the k-mismatch average common string (ACSk) problem that takes O(n) space and O( n log(k) n) time in the worst case for any constant k.

Optimal suffix tree construction with large alphabets

  • M. Farach
  • Computer Science
    Proceedings 38th Annual Symposium on Foundations of Computer Science
  • 1997
This work builds suffix trees in linear time for integer alphabet using Weiner's algorithm, which matches a trivial /spl Omega/(n log n)-time lower bound based on sorting.

Suffix arrays: a new method for on-line string searches

A new and conceptually simple data structure, called a suffixarray, for on-line string searches is introduced in this paper, and it is believed that suffixarrays will prove to be better in practice than suffixtrees for many applications.

kmacs: the k-mismatch average common substring approach to alignment-free sequence comparison

This work describes kmacs, an efficient implementation of this idea based on generalized enhanced suffix arrays, and presents a greedy heuristic to approximate the length of such k-mismatch substrings by considering longest common substrings with k mismatches.