- Edgar Chávez, Gonzalo Navarro, Ricardo A. Baeza-Yates, José L. Marroquín
- ACM Comput. Surv.
- 2001

The problem of searching the elements of a set that are close to a given query element under some similarity criterion has a vast number of applications in many branches of computer science, from pattern recognition to textual and multimedia information retrieval. We are interested in the rather general case where the similarity criterion defines a metric… (More)

- Gonzalo Navarro
- ACM Comput. Surv.
- 2001

We survey the current techniques to cope with the problem of string matching that allows errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its… (More)

- Gonzalo Navarro
- Softw., Pract. Exper.
- 2001

We present nrgrep (\nondeterministic reverse grep"), a new pattern matching tool designed for eecient search of complex patterns. Unlike previous tools of the grep family, such as agrep and Gnu grep, nrgrep is based on a single and uniform concept: the bit-parallel simulation of a nondeterministic suux automaton. As a result, nrgrep can nd from simple… (More)

- Edgar Chávez, Gonzalo Navarro
- Pattern Recognition Letters
- 2005

The metric space model abstracts many proximity search problems, from nearest-neighbor classifiers to textual and multimedia information retrieval. In this context, an index is a data structure that speeds up proximity queries. However , indexes lose their efficiency as the intrinsic data dimensionality increases. In this paper we present a simple index… (More)

- Gonzalo Navarro, Veli Mäkinen
- ACM Comput. Surv.
- 2007

Full-text indexes provide fast substring search over large text collections. A serious problem of these indexes has traditionally been their space consumption. A recent trend is to develop indexes that exploit the compressibility of the text, so that their size is a function of the compressed text length. This concept has evolved into <i>self-indexes</i>,… (More)

- Gonzalo Navarro
- VLDB J.
- 1999

We propose a new data structure to search in metric spaces. A metric space is formed by a collection of objects and a distance function deened among them, which satisses the triangular inequality. The goal is, given a set of objects and a query, retrieve those objects close enough to the query. The number of distances computed to achieve this goal is the… (More)

- Edgar Chávez, Karina Figueroa, Gonzalo Navarro
- IEEE Transactions on Pattern Analysis and Machine…
- 2008

We introduce a new probabilistic proximity search algorithm for range and A"-nearest neighbor (A"-NN) searching in both coordinate and metric spaces. Although there exist solutions for these problems, they boil down to a linear scan when the space is intrinsically high dimensional, as is the case in many pattern recognition tasks. This, for example, renders… (More)

- Ricardo A. Baeza-Yates, Gonzalo Navarro
- Algorithmica
- 1999

We present a new algorithm for on-line approximate string matching. The algorithm is based on the simulation of a non-deterministic nite automaton built from the pattern and using the text as input. This simulation uses bit operations on a RAM machine with word length w = (log n) bits, where n is the text size. This is essentially similar to the model used… (More)

- Veli Mäkinen, Gonzalo Navarro
- Nord. J. Comput.
- 2005

A succinct full-text self-index is a data structure built on a text T = t1t2. .. tn, which takes little space (ideally close to that of the compressed text), permits efficient search for the occurrences of a pattern P = p1p2. .. pm in T , and is able to reproduce any text substring, so the self-index replaces the text. Several remarkable self-indexes have… (More)