Absent Subsequences in Words

  title={Absent Subsequences in Words},
  author={Maria Kosche and Tore Ko{\ss} and Florin Manea and Stefan Siemer},
An absent factor of a string w is a string u which does not occur as a contiguous substring (a.k.a. factor) inside w. We extend this well-studied notion and define absent subsequences: a string u is an absent subsequence of a string w if u does not occur as subsequence (a.k.a. scattered factor) inside w. Of particular interest to us are minimal absent subsequences, i.e., absent subsequences whose every subsequence is not absent, and shortest absent subsequences, i.e., absent subsequences of… Expand

Figures from this paper


Directed acyclic subsequence graph - Overview
An automaton is built that accepts all subsequences of given texts and an upper bound for its number of states is proved, which is called the Directed Acyclic Subsequence Graph (DASG). Expand
Absent words in a sliding window with applications
An algorithm is proposed that maintains the set of MAWs of a fixed-length window sliding over y online and applies this algorithm to the approximate pattern-matching problem under the Length Weighted Index distance, resulting in an online O(σ|y|)-time algorithm for finding approximate occurrences of a word x in y. Expand
Parallelising the Computation of Minimal Absent Words
Experimental results show that a multiprocessing implementation of this algorithm can accelerate the overall computation by more than a factor of two compared to state-of-the-art approaches, and it is shown that the implementation achieves near-optimal speed-ups. Expand
Constructing Strings Avoiding Forbidden Substrings
These algorithms are motivated by data privacy, and in particular, by the data sanitization process, and can be directly applied to solve the reachability and shortest path problems in complete de Bruijn graphs in the presence of forbidden edges or of forbidden paths. Expand
Linear-time computation of minimal absent words using suffix array
A new linear-time and linear-space algorithm for the computation of minimal absent words based on the suffix array is presented and experimental results show that this implementation outperforms the one by Pinho et al. Expand
Words and forbidden factors
It is proved that these two functions characterize, up to the automorphism exchanging the two letters, the language of factors of each single infinite Sturmian word. Expand
Using minimal absent words to build phylogeny
This paper presents an efficient method for computing the minimal absent words of bounded length using a trie of bounded depth, representing bounded length factors, and provides a linear-time algorithm with less memory usage than previous solutions. Expand
Computing DAWGs and Minimal Absent Words in Linear Time for Integer Alphabets
The directed acyclic word graph (DAWG) of a string y is the smallest (partial) DFA which recognizes all suffixes of y and has only O(n) nodes and edges and it is shown that the set MAW(y) of all minimal absent words of y can be computed in optimal O-time time and working space for integer alphabets. Expand
Alignment-free sequence comparison using absent words
This paper presents the first linear-time and linear-space algorithm to compare two sequences by considering all their minimal absent words, and presents an algorithm that computes the largest integer for which all factors of x of that length occur in some minimal absent word of x in time and space O ( n ) . Expand
On Extended Special Factors of a Word
This article shows that there is no constant \(c<3\) such that \(\rho (n) \le cn\), and constructs a data structure for computing minimal absent words of a specific length in optimal time for integer alphabets. Expand