# NR‐grep: a fast and flexible pattern‐matching tool

@article{Navarro2001NRgrepAF, title={NR‐grep: a fast and flexible pattern‐matching tool}, author={Gonzalo Navarro}, journal={Software: Practice and Experience}, year={2001}, volume={31} }

We present nrgrep (‘non‐deterministic reverse grep’), a new pattern‐matching tool designed for efficient search of complex patterns. Unlike previous tools of the grep family, such as agrep and Gnu grep, nrgrep is based on a single and uniform concept: the bit‐parallel simulation of a non‐deterministic suffix automaton. As a result, nrgrep can find from simple patterns to regular expressions, exactly or allowing errors in the matches, with an efficiency that degrades smoothly as the complexity… Expand

#### Topics from this paper

#### 118 Citations

PatMatch: a program for finding patterns in peptide and nucleotide sequences

- Biology, Computer Science
- Nucleic Acids Res.
- 2005

Here, we present PatMatch, an efficient, web-based pattern-matching program that enables searches for short nucleotide or peptide sequences such as cis-elements in nucleotide sequences or small… Expand

Negative Factor: Improving Regular-Expression Matching in Strings

- Computer Science
- TODS
- 2016

An efficient algorithm that utilizes negative factors to prune candidates, then improves it by using bit operations to process negative factors in parallel, which shows that negative factors, when used with necessary factors, can achieve much better pruning power. Expand

Improving regular-expression matching on strings using negative factors

- Computer Science
- SIGMOD '13
- 2013

An efficient algorithm that utilizes negative factors to prune candidates, then improves it by using bit operations to process negative factors in parallel, showing that negative factors, when used together with necessary factors (substrings that must appear in each answer), can achieve much better pruning power. Expand

Fast and flexible string matching by combining bit-parallelism and suffix automata

- Computer Science
- JEAL
- 2000

A new automaton to recognize suffixes of patterns with classes of characters is introduced, which seems very adequate for computational biology applications, since it is the fastest algorithm to search on DNA sequences and flexible searching is an important problem in that area. Expand

Fast and simple character classes and bounded gaps pattern matching, with application to protein searching

- Computer Science
- RECOMB
- 2001

Two new practical CBG matching algorithms that are much simpler and faster than all the RE search techniques are designed, and a criterion based on the form of the CBG to choose a-priori the fastest between both. Expand

Fast and Simple Character Classes and Bounded Gaps Pattern Matching, with Applications to Protein Searching

- Mathematics, Computer Science
- J. Comput. Biol.
- 2003

Two new practical CBG matching algorithms that are much simpler and faster than all the RE search techniques are designed, and a criterion based on the form of the CBG to choose a priori the fastest between both are proposed. Expand

From Nondeterministic Suffix Automaton to Lazy Suffix Tree

- Computer Science
- Algorithms and Applications
- 2010

This paper takes the underlying nondeterministic suffix automaton and applies it to the text instead of to the pattern, and shows how the algorithm can be easily adapted to construct the suffix tree of T in a lazy manner. Expand

New Techniques for Regular Expression Searching

- Mathematics, Computer Science
- Algorithmica
- 2004

Two new techniques for regular expression searching are presented, one able and one unable to skip text characters, which permit fast searching for regular expressions, normally faster than any existing algorithm. Expand

Survey of Global Regular Expression Print (grep) Tools

- 2004

The UNIX grep utility marked the birth of a global regular expression print (GREP) tools. Searching for patterns in text is important operation in a number of domains, including program comprehension… Expand

Bitwise data parallelism in regular expression matching

- Computer Science
- 2014 23rd International Conference on Parallel Architecture and Compilation (PACT)
- 2014

A new parallel algorithm for regular expression matching is developed and applied to the classical grep (global regular expression print) problem and can substantially outperform traditional grep implementations based on NFAs, DFAs or backtracking. Expand

#### References

SHOWING 1-10 OF 38 REFERENCES

Fast and flexible string matching by combining bit-parallelism and suffix automata

- Computer Science
- JEAL
- 2000

A new automaton to recognize suffixes of patterns with classes of characters is introduced, which seems very adequate for computational biology applications, since it is the fastest algorithm to search on DNA sequences and flexible searching is an important problem in that area. Expand

A Bit-Parallel Approach to Suffix Automata: Fast Extended String Matching

- Computer Science
- CPM
- 1998

A new algorithm for string matching called BNDM, which is the bit-parallel simulation of a known (but recent) algorithm called BDM, and which can be extended to handle classes of characters in the pattern and in the text, multiple patterns and to allow errors in thepattern or in thetext, combining simplicity, efficiency and flexibility. Expand

Faster Approximate String Matching

- Mathematics, Computer Science
- Algorithmica
- 1999

The algorithm is based on the simulation of a nondeterministic finite automaton built from the pattern and using the text as input and it is shown that the algorithms are among the fastest for typical text searching, being the fastest in some cases. Expand

Compact DFA Representation for Fast Regular Expression Search

- Computer Science, Mathematics
- WAE
- 2001

This work presents a new technique to encode a deterministic finite automaton (DFA) using (m+1)(2m-1 +|Σ|) bits, where m is the number of characters (excluding operator symbols) in the regular expression and Σ is the alphabet. Expand

GLIMPSE: A Tool to Search Through Entire File Systems

- Computer Science
- USENIX Winter
- 1994

Glimpse is particularly designed for personal information, such as one's own file system, that should support many types of queries, flexible interaction, low overhead, and customization, All these are important features of glimpse. Expand

Fast Pattern Matching in Strings

- Mathematics, Computer Science
- SIAM J. Comput.
- 1977

An algorithm is presented which finds all occurrences of one given string within another, in running time proportional to the sum of the lengths of the strings, showing that the set of concatenations of even palindromes, i.e., the language $\{\alpha \alpha ^R\}^*$, can be recognized in linear time. Expand

A fast bit-vector algorithm for approximate string matching based on dynamic programming

- Mathematics, Computer Science
- JACM
- 1999

An algorithm of comparable simplicity that requires only O(kn/w) time by virtue of computing a bit representation of the relocatable dynamic programming matrix for the approximate string matching problem, and is found to be more efficient than the previous results for many choices of k and small. Expand

From Regular Expressions to Deterministic Automata

- Computer Science, Mathematics
- Theor. Comput. Sci.
- 1986

The main theorem allows an elegant algorithm to be refined into an efficient one based on ‘marking of’ regular expressions based on derivatives of regular expressions, which constructs an automaton for the marked expression. Expand

A fast string searching algorithm

- Computer Science
- CACM
- 1977

The algorithm has the unusual property that, in most cases, not all of the first <italic>i</italic) characters of a character string, “<italic>.” in another string, are inspected. Expand

Fast text searching: allowing errors

- Computer Science
- CACM
- 1992

T h e string-matching problem is a very c o m m o n problem; there are many extensions to t h i s problem; for example, it may be looking for a set of patterns, a pattern w i t h "wi ld cards," or a regular expression. Expand