Roberto Grossi

Learn More
The proliferation of online text, such as on the World Wide Web and in databases, motivates the need for space-efficient index methods that support fast search. Consider a text T of n binary symbols to index. Given any query pattern P of m binary symbols, the goal is to search f?r P in T quickly, with T being fully scanned only once, nafiaely, when the(More)
We introduce a new text-indexing data structure, the <italic>String B-Tree</italic>, that can be seen as a link between some traditional external-memory and string-matching data structures. In a short phrase, it is a combination of B-trees and Patricia tries for internal-node indices that is made more effective by adding extra pointers to speed up search(More)
We report on a new and improved version of high-order entropy-compressed suffix arrays, which has theoretical performance guarantees similar to those in our earlier work [16], yet represents an improvement in practice. Our experiments indicate that the resulting text index offers state-of-the-art compression. In particular, we require roughly 20% of the(More)
We investigate the problem of determining the basis of motifs (a form of repeated patterns with don’t cares) in an input string. We give new upper and lower bounds on the problem, introducing a new notion of basis that is provably smaller than (and contained in) previously defined ones. Our basis can be computed in less time and space, and is still able to(More)
Tries are popular data structures for storing a set of strings, where common prefixes are represented by common root-to-node paths. More than 50 years of usage have produced many variants and implementations to overcome some of their limitations. We explore new succinct representations of path-decomposed tries and experimentally evaluate the corresponding(More)
Motif inference represents one of the most important areas of research in computational biology, and one of its oldest ones. Despite this, the problem remains very much open in the sense that no existing definition is fully satisfying, either in formal terms, or in relation to the biological questions that involve finding such motifs. Two main types of(More)
In a previous work [S], we proposed a text indexing data structure for secondary storage, which we called SB-tree, that combines the best of B-trees and suffix arrays, overcoming the limitations of inverted files, suffix arrays, suffix trees, and prefix B-trees. In this paper we study the performance of SB-trees in a practical setting, performing a set of(More)