# NR‐grep: a fast and flexible pattern‐matching tool

@article{Navarro2001NRgrepAF,
title={NR‐grep: a fast and flexible pattern‐matching tool},
author={Gonzalo Navarro},
journal={Software: Practice and Experience},
year={2001},
volume={31}
}
• G. Navarro
• Published 30 October 2001
• Computer Science
• Software: Practice and Experience
We present nrgrep (‘non‐deterministic reverse grep’), a new pattern‐matching tool designed for efficient search of complex patterns. Unlike previous tools of the grep family, such as agrep and Gnu grep, nrgrep is based on a single and uniform concept: the bit‐parallel simulation of a non‐deterministic suffix automaton. As a result, nrgrep can find from simple patterns to regular expressions, exactly or allowing errors in the matches, with an efficiency that degrades smoothly as the complexity… Expand
118 Citations
PatMatch: a program for finding patterns in peptide and nucleotide sequences
Here, we present PatMatch, an efficient, web-based pattern-matching program that enables searches for short nucleotide or peptide sequences such as cis-elements in nucleotide sequences or smallExpand
Negative Factor: Improving Regular-Expression Matching in Strings
• Computer Science
• TODS
• 2016
An efficient algorithm that utilizes negative factors to prune candidates, then improves it by using bit operations to process negative factors in parallel, which shows that negative factors, when used with necessary factors, can achieve much better pruning power. Expand
Improving regular-expression matching on strings using negative factors
• Computer Science
• SIGMOD '13
• 2013
An efficient algorithm that utilizes negative factors to prune candidates, then improves it by using bit operations to process negative factors in parallel, showing that negative factors, when used together with necessary factors (substrings that must appear in each answer), can achieve much better pruning power. Expand
Fast and flexible string matching by combining bit-parallelism and suffix automata
• Computer Science
• JEAL
• 2000
A new automaton to recognize suffixes of patterns with classes of characters is introduced, which seems very adequate for computational biology applications, since it is the fastest algorithm to search on DNA sequences and flexible searching is an important problem in that area. Expand
Fast and simple character classes and bounded gaps pattern matching, with application to protein searching
• Computer Science
• RECOMB
• 2001
Two new practical CBG matching algorithms that are much simpler and faster than all the RE search techniques are designed, and a criterion based on the form of the CBG to choose a-priori the fastest between both. Expand
Fast and Simple Character Classes and Bounded Gaps Pattern Matching, with Applications to Protein Searching
• Mathematics, Computer Science
• J. Comput. Biol.
• 2003
Two new practical CBG matching algorithms that are much simpler and faster than all the RE search techniques are designed, and a criterion based on the form of the CBG to choose a priori the fastest between both are proposed. Expand
From Nondeterministic Suffix Automaton to Lazy Suffix Tree
This paper takes the underlying nondeterministic suffix automaton and applies it to the text instead of to the pattern, and shows how the algorithm can be easily adapted to construct the suffix tree of T in a lazy manner. Expand
New Techniques for Regular Expression Searching
• Mathematics, Computer Science
• Algorithmica
• 2004
Two new techniques for regular expression searching are presented, one able and one unable to skip text characters, which permit fast searching for regular expressions, normally faster than any existing algorithm. Expand
Survey of Global Regular Expression Print (grep) Tools
• 2004
The UNIX grep utility marked the birth of a global regular expression print (GREP) tools. Searching for patterns in text is important operation in a number of domains, including program comprehensionExpand
Bitwise data parallelism in regular expression matching
• R. Cameron, +4 authors Meng Lin
• Computer Science
• 2014 23rd International Conference on Parallel Architecture and Compilation (PACT)
• 2014
A new parallel algorithm for regular expression matching is developed and applied to the classical grep (global regular expression print) problem and can substantially outperform traditional grep implementations based on NFAs, DFAs or backtracking. Expand

#### References

SHOWING 1-10 OF 38 REFERENCES
Fast and flexible string matching by combining bit-parallelism and suffix automata
• Computer Science
• JEAL
• 2000
A new automaton to recognize suffixes of patterns with classes of characters is introduced, which seems very adequate for computational biology applications, since it is the fastest algorithm to search on DNA sequences and flexible searching is an important problem in that area. Expand
A Bit-Parallel Approach to Suffix Automata: Fast Extended String Matching
• Computer Science
• CPM
• 1998
A new algorithm for string matching called BNDM, which is the bit-parallel simulation of a known (but recent) algorithm called BDM, and which can be extended to handle classes of characters in the pattern and in the text, multiple patterns and to allow errors in thepattern or in thetext, combining simplicity, efficiency and flexibility. Expand
Faster Approximate String Matching
• Mathematics, Computer Science
• Algorithmica
• 1999
The algorithm is based on the simulation of a nondeterministic finite automaton built from the pattern and using the text as input and it is shown that the algorithms are among the fastest for typical text searching, being the fastest in some cases. Expand
Compact DFA Representation for Fast Regular Expression Search
• Computer Science, Mathematics
• WAE
• 2001
This work presents a new technique to encode a deterministic finite automaton (DFA) using (m+1)(2m-1 +|Σ|) bits, where m is the number of characters (excluding operator symbols) in the regular expression and Σ is the alphabet. Expand
GLIMPSE: A Tool to Search Through Entire File Systems
• Computer Science
• USENIX Winter
• 1994
Glimpse is particularly designed for personal information, such as one's own file system, that should support many types of queries, flexible interaction, low overhead, and customization, All these are important features of glimpse. Expand
Fast Pattern Matching in Strings
• Mathematics, Computer Science
• SIAM J. Comput.
• 1977
An algorithm is presented which finds all occurrences of one given string within another, in running time proportional to the sum of the lengths of the strings, showing that the set of concatenations of even palindromes, i.e., the language $\{\alpha \alpha ^R\}^*$, can be recognized in linear time. Expand
A fast bit-vector algorithm for approximate string matching based on dynamic programming
• E. Myers
• Mathematics, Computer Science
• JACM
• 1999
An algorithm of comparable simplicity that requires only O(kn/w) time by virtue of computing a bit representation of the relocatable dynamic programming matrix for the approximate string matching problem, and is found to be more efficient than the previous results for many choices of k and small. Expand
From Regular Expressions to Deterministic Automata
• Computer Science, Mathematics
• Theor. Comput. Sci.
• 1986
The main theorem allows an elegant algorithm to be refined into an efficient one based on ‘marking of’ regular expressions based on derivatives of regular expressions, which constructs an automaton for the marked expression. Expand
A fast string searching algorithm
• Computer Science
• CACM
• 1977
The algorithm has the unusual property that, in most cases, not all of the first <italic>i</italic) characters of a character string, “<italic>.” in another string, are inspected. Expand
Fast text searching: allowing errors
• Computer Science
• CACM
• 1992
T h e string-matching problem is a very c o m m o n problem; there are many extensions to t h i s problem; for example, it may be looking for a set of patterns, a pattern w i t h "wi ld cards," or a regular expression. Expand