Learn More
The problem of searching the elements of a set that are close to a given query element under some similarity criterion has a vast number of applications in many branches of computer science, from pattern recognition to textual and multimedia information retrieval. We are interested in the rather general case where the similarity criterion defines a metric(More)
We survey the current techniques to cope with the problem of string matching that allows errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its(More)
We present nrgrep (\nondeterministic reverse grep"), a new pattern matching tool designed for eecient search of complex patterns. Unlike previous tools of the grep family, such as agrep and Gnu grep, nrgrep is based on a single and uniform concept: the bit-parallel simulation of a nondeterministic suux automaton. As a result, nrgrep can nd from simple(More)
Full-text indexes provide fast substring search over large text collections. A serious problem of these indexes has traditionally been their space consumption. A recent trend is to develop indexes that exploit the compressibility of the text, so that their size is a function of the compressed text length. This concept has evolved into <i>self-indexes</i>,(More)
The metric space model abstracts many proximity search problems, from nearest-neighbor classifiers to textual and multimedia information retrieval. In this context, an index is a data structure that speeds up proximity queries. However , indexes lose their efficiency as the intrinsic data dimensionality increases. In this paper we present a simple index(More)
We introduce a new probabilistic proximity search algorithm for range and A"-nearest neighbor (A"-NN) searching in both coordinate and metric spaces. Although there exist solutions for these problems, they boil down to a linear scan when the space is intrinsically high dimensional, as is the case in many pattern recognition tasks. This, for example, renders(More)
We propose a new data structure to search in metric spaces. A metric space is formed by a collection of objects and a distance function deened among them, which satisses the triangular inequality. The goal is, given a set of objects and a query, retrieve those objects close enough to the query. The number of distances computed to achieve this goal is the(More)
A succinct full-text self-index is a data structure built on a text T = t1t2. .. tn, which takes little space (ideally close to that of the compressed text), permits efficient search for the occurrences of a pattern P = p1p2. .. pm in T , and is able to reproduce any text substring, so the self-index replaces the text. Several remarkable self-indexes have(More)