NR‐grep: a fast and flexible pattern‐matching tool

@article{Navarro2001NRgrepAF,
  title={NR‐grep: a fast and flexible pattern‐matching tool},
  author={Gonzalo Navarro},
  journal={Software: Practice and Experience},
  year={2001},
  volume={31}
}
  • G. Navarro
  • Published 30 October 2001
  • Computer Science
  • Software: Practice and Experience
We present nrgrep (‘non‐deterministic reverse grep’), a new pattern‐matching tool designed for efficient search of complex patterns. Unlike previous tools of the grep family, such as agrep and Gnu grep, nrgrep is based on a single and uniform concept: the bit‐parallel simulation of a non‐deterministic suffix automaton. As a result, nrgrep can find from simple patterns to regular expressions, exactly or allowing errors in the matches, with an efficiency that degrades smoothly as the complexity… 

PatMatch: a program for finding patterns in peptide and nucleotide sequences

Here, we present PatMatch, an efficient, web-based pattern-matching program that enables searches for short nucleotide or peptide sequences such as cis-elements in nucleotide sequences or small

Improving regular-expression matching on strings using negative factors

An efficient algorithm that utilizes negative factors to prune candidates, then improves it by using bit operations to process negative factors in parallel, showing that negative factors, when used together with necessary factors (substrings that must appear in each answer), can achieve much better pruning power.

Fast and flexible string matching by combining bit-parallelism and suffix automata

A new automaton to recognize suffixes of patterns with classes of characters is introduced, which seems very adequate for computational biology applications, since it is the fastest algorithm to search on DNA sequences and flexible searching is an important problem in that area.

Fast and simple character classes and bounded gaps pattern matching, with application to protein searching

Two new practical CBG matching algorithms that are much simpler and faster than all the RE search techniques are designed, and a criterion based on the form of the CBG to choose a-priori the fastest between both.

Fast and Simple Character Classes and Bounded Gaps Pattern Matching, with Applications to Protein Searching

Two new practical CBG matching algorithms that are much simpler and faster than all the RE search techniques are designed, and a criterion based on the form of the CBG to choose a priori the fastest between both are proposed.

From Nondeterministic Suffix Automaton to Lazy Suffix Tree

This paper takes the underlying nondeterministic suffix automaton and applies it to the text instead of to the pattern, and shows how the algorithm can be easily adapted to construct the suffix tree of T in a lazy manner.

New Techniques for Regular Expression Searching

Two new techniques for regular expression searching are presented, one able and one unable to skip text characters, which permit fast searching for regular expressions, normally faster than any existing algorithm.

Negative Factor

An efficient algorithm that utilizes negative factors to prune candidates, then improves it by using bit operations to process negative factors in parallel, which shows that negative factors, when used with necessary factors, can achieve much better pruning power.

Survey of Global Regular Expression Print (grep) Tools

This survey presents all the major developments in global regular expression print tools, namely the UNIX grep family, the GNU grepfamily, agrep, cGrep, sgrep, nrgrep, and Perl regular expressions.

Bitwise data parallelism in regular expression matching

  • R. CameronT. Shermer Meng Lin
  • Computer Science
    2014 23rd International Conference on Parallel Architecture and Compilation (PACT)
  • 2014
A new parallel algorithm for regular expression matching is developed and applied to the classical grep (global regular expression print) problem and can substantially outperform traditional grep implementations based on NFAs, DFAs or backtracking.
...

References

SHOWING 1-10 OF 37 REFERENCES

Fast and flexible string matching by combining bit-parallelism and suffix automata

A new automaton to recognize suffixes of patterns with classes of characters is introduced, which seems very adequate for computational biology applications, since it is the fastest algorithm to search on DNA sequences and flexible searching is an important problem in that area.

A Bit-Parallel Approach to Suffix Automata: Fast Extended String Matching

A new algorithm for string matching called BNDM, which is the bit-parallel simulation of a known (but recent) algorithm called BDM, and which can be extended to handle classes of characters in the pattern and in the text, multiple patterns and to allow errors in thepattern or in thetext, combining simplicity, efficiency and flexibility.

Faster Approximate String Matching

The algorithm is based on the simulation of a nondeterministic finite automaton built from the pattern and using the text as input and it is shown that the algorithms are among the fastest for typical text searching, being the fastest in some cases.

Compact DFA Representation for Fast Regular Expression Search

This work presents a new technique to encode a deterministic finite automaton (DFA) using (m+1)(2m-1 +|Σ|) bits, where m is the number of characters (excluding operator symbols) in the regular expression and Σ is the alphabet.

GLIMPSE: A Tool to Search Through Entire File Systems

Glimpse is particularly designed for personal information, such as one's own file system, that should support many types of queries, flexible interaction, low overhead, and customization, All these are important features of glimpse.

Fast Pattern Matching in Strings

An algorithm is presented which finds all occurrences of one given string within another, in running time proportional to the sum of the lengths of the strings, showing that the set of concatenations of even palindromes, i.e., the language $\{\alpha \alpha ^R\}^*$, can be recognized in linear time.

A fast bit-vector algorithm for approximate string matching based on dynamic programming

An algorithm of comparable simplicity that requires only O(kn/w) time by virtue of computing a bit representation of the relocatable dynamic programming matrix for the approximate string matching problem, and is found to be more efficient than the previous results for many choices of k and small.

From Regular Expressions to Deterministic Automata

A fast string searching algorithm

The algorithm has the unusual property that, in most cases, not all of the first <italic>i</italic) characters of a character string, “<italic>.” in another string, are inspected.

Fast text searching: allowing errors

T h e string-matching problem is a very c o m m o n problem; there are many extensions to t h i s problem; for example, it may be looking for a set of patterns, a pattern w i t h "wi ld cards," or a regular expression.