Learn More
The proliferation of online text, such as on the World Wide Web and in databases, motivates the need for space-efficient index methods that support fast search. Consider a text T of n binary symbols to index. Given any query pattern P of m binary symbols, the goal is to search f?r P in T quickly, with T being fully scanned only once, nafiaely, when the(More)
We introduce a new text-indexing data structure, the <italic>String B-Tree</italic>, that can be seen as a link between some traditional external-memory and string-matching data structures. In a short phrase, it is a combination of B-trees and Patricia tries for internal-node indices that is made more effective by adding extra pointers to speed up search(More)
We report on a new and improved version of high-order entropy-compressed suffix arrays, which has theoretical performance guarantees similar to those in our earlier work [16], yet represents an improvement in practice. Our experiments indicate that the resulting text index offers state-of-the-art compression. In particular, we require roughly 20% of the(More)
We consider the problem of representing, in a compressed format, a bit-vector S of m bits with n 1s, supporting the following operations, where b ∈ {0, 1}: • rank b (S, i) returns the number of occurrences of bit b in the prefix S [1..i]; • select b (S, i) returns the position of the ith occurrence of bit b in S. Such a data structure is called fully(More)
In a previous work [S], we proposed a text indexing data structure for secondary storage, which we called SB-tree, that combines the best of B-trees and suffix arrays, overcoming the limitations of inverted files, suffix arrays, suffix trees, and prefix B-trees. In this paper we study the performance of SB-trees in a practical setting , performing a set of(More)
We address the issue of efficiently searching on external dynamic data structures for strings, introducing the External Dynamic Substring Search problem. Consider a set A of (external) text strings kept into secondary storage. The set A can be dynamically changed by inserting or deleting strings, and on-line searched to find all the occurrences of an(More)
We investigate the problem of determining the basis of motifs (a form of repeated patterns with don't cares) in an input string. We give new upper and lower bounds on the problem, introducing a new notion of basis that is provably smaller than (and contained in) previously defined ones. Our basis can be computed in less time and space, and is still able to(More)