Learn More
The proliferation of online text, such as found on the World Wide Web and in online databases, motivates the need for space-efficient text indexing methods that support fast string searching. We model this scenario as follows: Consider a text T consisting of n symbols drawn from a fixed alphabet Σ. The text T can be represented in n lg |Σ| bits by encoding(More)
We introduce a new text-indexing data structure, the <italic>String B-Tree</italic>, that can be seen as a link between some traditional external-memory and string-matching data structures. In a short phrase, it is a combination of B-trees and Patricia tries for internal-node indices that is made more effective by adding extra pointers to speed up search(More)
We report on a new and improved version of high-order entropy-compressed suffix arrays, which has theoretical performance guarantees similar to those in our earlier work [16], yet represents an improvement in practice. Our experiments indicate that the resulting text index offers state-of-the-art compression. In particular, we require roughly 20% of the(More)
We consider the problem of representing, in a compressed format, a bit-vector S of m bits with n 1s, supporting the following operations, where b ∈ {0, 1}: • rank b (S, i) returns the number of occurrences of bit b in the prefix S [1..i]; • select b (S, i) returns the position of the ith occurrence of bit b in S. Such a data structure is called fully(More)
In this paper we address for the first time the I/O complexity of the problem of sorting strings in external memory , which is a fundamental component of many large-scale text applications. In the standard unit-cost RAM comparison model, the complexity of sorting K strings of total length N is (K log 2 K +N). By analogy, in the external memory (or I/O)(More)
In a previous work [S], we proposed a text indexing data structure for secondary storage, which we called SB-tree, that combines the best of B-trees and suffix arrays, overcoming the limitations of inverted files, suffix arrays, suffix trees, and prefix B-trees. In this paper we study the performance of SB-trees in a practical setting , performing a set of(More)
The proliferation of online text, such as on the World Wide Web and in databases, motivates the need for space-efficient index methods that support fast search. Consider a text T of n binary symbols to index. Given any query pattern P of m binary symbols, the goal is to search f?r P in T quickly, with T being fully scanned only once, nafiaely, when the(More)