Learn More
The proliferation of online text, such as on the World Wide Web and in databases, motivates the need for space-efficient index methods that support fast search. Consider a text T of n binary symbols to index. Given any query pattern P of m binary symbols, the goal is to search f?r P in T quickly, with T being fully scanned only once, nafiaely, when the(More)
We introduce a new text-indexing data structure, the <italic>String B-Tree</italic>, that can be seen as a link between some traditional external-memory and string-matching data structures. In a short phrase, it is a combination of B-trees and Patricia tries for internal-node indices that is made more effective by adding extra pointers to speed up search(More)
We report on a new and improved version of high-order entropy-compressed suffix arrays, which has theoretical performance guarantees similar to those in our earlier work [16], yet represents an improvement in practice. Our experiments indicate that the resulting text index offers state-of-the-art compression. In particular, we require roughly 20% of the(More)
We consider the problem of representing, in a compressed format, a bit-vector S of m bits with n 1s, supporting the following operations, where b ∈ {0, 1}: • rank b (S, i) returns the number of occurrences of bit b in the prefix S [1..i]; • select b (S, i) returns the position of the ith occurrence of bit b in S. Such a data structure is called fully(More)
In a previous work [S], we proposed a text indexing data structure for secondary storage, which we called SB-tree, that combines the best of B-trees and suffix arrays, overcoming the limitations of inverted files, suffix arrays, suffix trees, and prefix B-trees. In this paper we study the performance of SB-trees in a practical setting , performing a set of(More)
We address the issue of efficiently searching on external dynamic data structures for strings, introducing the External Dynamic Substring Search problem. Consider a set A of (external) text strings kept into secondary storage. The set A can be dynamically changed by inserting or deleting strings, and on-line searched to find all the occurrences of an(More)