Learn More
We present a new indexing method for the approximate string matching problem. The method is based on a suffix array combined with a partitioning of the pattern. We analyze the resulting algorithm and show that the average retrieval time is Ç´Ò ÐÓÓ Òµ, for some ¼ that depends on the error fraction tolerated « and the alphabet size. It is shown that ½ for(More)
Searching in metric spaces is the problem of, given a set of elements and a distance function deened among them, nd all the elements close enough to a given query element. For eeciency, they try to minimize the number of evaluations of the distance function. This problem has a large number of applications, and a well-known particular case is that of vector(More)
We present a full inverted index for exact and approximate string matching in large texts. The index is composed of a table containing the vocabulary of words of the text and a list of positions in the text corresponding to each word. The size of the table of words is usually much less than 1% of the text size and hence can be kept in main memory, where(More)
Research on succinct data structures has made significant progress in recent years. An essential building block of many of those techniques is a data structure to perform rank and select operations over a bit array. The first operation tells how many bits are set up to some position, and the second the position of the i-th bit set. Albeit there exist(More)
We present a technique to build an index based on suux arrays for compressed texts. We also propose a compression scheme for textual databases based on words that generates a compression code that preserves the lexicographical ordering of the text words. As a consequence it permits the sorting of the compressed strings to generate the suux array without(More)
We present a very simple and eecient algorithm for on-line multiple approximate string matching. It uses a previously known counting-based lter 9] that searches for a single pattern by quickly discarding uninteresting parts of the text. Our multi-pattern algorithm is based on the simulation of many parallel lters using bits of the computer word. Our average(More)
Given a text T[1..u] over an alphabet of size σ, the full-text search problem consists in finding the occ occurrences of a given pattern P[1..m] in T. In indexed text searching we build an index on T to improve the search time, yet increasing the space requirement. The current trend in indexed text searching is that of compressed full-text self-indices,(More)
We present a data structure that stores a sequence s[1..n] over alphabet [1..σ] in $n\mathcal{H}_{0}(s) + o(n)(\mathcal {H}_{0}(s){+}1)$ bits, where $\mathcal{H}_{0}(s)$ is the zero-order entropy of s. This structure supports the queries access, rank and select, which are fundamental building blocks for many other compressed data structures, in worst-case(More)