Learn More
Given a static array of n totally ordered object, the range minimum query problem is to build an additional data structure that allows to answer subsequent on-line queries of the form " what is the position of a minimum element in the sub-array ranging from i to j? " efficiently. We focus on two settings, where (1) the input array is available at query(More)
The Range-Minimum-Query-Problem is to preprocess an array of length n in O(n) time such that all subsequent queries asking for the position of a minimal element between two specified indices can be obtained in constant time. This problem was first solved by Berkman and Vishkin [1], and Sadakane [2] gave the first succinct data structure that uses 4n + o(n)(More)
We propose a new algorithmic framework that solves frequency related data mining queries on databases of strings in optimal time, i.e., in time linear in the input and the output size. The additional space is linear in the input size. Our framework can be used to mine frequent strings, emerging strings and strings that pass other statistical tests, e.g.,(More)
We show how to modify the linear-time construction algorithm for suffix arrays based on induced sorting (Nong et al., DCC'09) such that it computes the array of longest common prefixes (LCP-array) as well. Practical tests show that this outperforms recent LCP-array construction algorithms (Gog and Ohlebusch, ALENEX'11).
It is widely assumed that O(m + lg σ) is the best one can do for finding a pattern of length m in a compacted trie storing strings over an alphabet of size σ, if one insists on linear-size data structures and deterministic worst-case running times [Cole et al., ICALP'06]. In this article, we first show that a rather straightforward combination of well-known(More)
The Range-Minimum-Query-Problem is to preprocess an array such that the position of the minimum element between two specified indices can be obtained efficiently. We present a direct algorithm for the general RMQ-problem with linear preprocessing time and constant query time, without making use of any dynamic data structure. It consumes less than half of(More)
Suffix trees are among the most important data structures in stringology, with a number of applications in flourishing areas like bioinformatics. Their main problem is space usage, which has triggered much research striving for compressed representations that are still functional. A smaller suffix tree representation could fit in a faster memory,(More)
Surprisingly enough, it is not yet known how to build directly a suffix array that indexes just the k positions at word-boundaries of a text T [1, n], taking O(n) time and O(k) space in addition to T. We propose a class-note solution to this problem that achieves such optimal time and space bounds. Word-based versions of indexes achieving the same(More)
vl vr x y NSV PSV h h−1 RMQ k (x+1,y) ψ ψ Suffix trees are one of the most beautiful data structures ever invented. Loved by stringologists for their nice theoretical properties. Loved by bioinformaticians for their practical applications. Hated by everyone for their huge space consumption. E.g. the Human Genome, 3 × 10 9 base pairs, easily fits in a(More)
We consider full text index construction in external memory (EM). Our first contribution is an inducing algorithm for suffix arrays in external memory, which runs in sorting complexity. Practical tests show that this algorithm outperforms the previous best EM suffix sorter [Dementiev et al., JEA 2008] by a factor of about two in time and I/O volume. Our(More)