Programming Techniques: Regular expression search algorithm

@article{Thompson1968ProgrammingTR,
  title={Programming Techniques: Regular expression search algorithm},
  author={Ken Thompson},
  journal={Commun. ACM},
  year={1968},
  volume={11},
  pages={419-422}
}
  • K. Thompson
  • Published 1 June 1968
  • Computer Science
  • Commun. ACM
A method for locating specific character strings embedded in character text is described and an implementation of this method in the form of a compiler is discussed. [] Key Method The object program then accepts the text to be searched as input and produces a signal every time an embedded string in the text matches the given regular expression. Examples, problems, and solutions are also presented.

Figures from this paper

Fast Regular Expression Search

TLDR
A new algorithm to search regular expressions is presented, which is able to skip text characters, and is fast, the fastest one in many cases of interest.

A regular expression pattern matching processor for APL

TLDR
This paper discusses classical regular expressions and their extension into the domain of APL in terms of locator templates, which describe patterns to be searched for, and action templates,Which specify an action to be performed when a match is encountered.

Fast text searching for regular expressions or automaton searching on tries

TLDR
This work obtains searching algorithms that run in logarithmic expected time in the size of the text for a wide subclass of regular expressions, and in sublinear expected time for any regular expression.

A compact function for regular expression pattern matching

TLDR
This paper describes a simple compiler and interpreter for a finite state machine recognizer of patterns represented by regular expressions to be compact and to require little work space.

A fast regular expression indexing engine

TLDR
The design, architecture, and lessons learned from the implementation of a fast regular-expression indexing engine FREE show orders of magnitude performance improvement in certain cases over standard regular expression matching systems, such as lex, awk and grep.

Pattern Matching in Strings

TLDR
Most formal systems handling strings can be considered as defining patterns in strings, especially for formal grammars and especially for regular expressions which provide a technique to specify simple patterns.

Efficient string matching

TLDR
A simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text that has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10.

Fast and compact regular expression matching

Efficient tree construction for formal language query processing

TLDR
The proposed algorithms are a preprocessing step for search algorithms which bypass the construction of a separate automaton for a given query.

Regular Expression Search on Compressed Text

TLDR
An algorithm for searching regular expression matches in compressed text that requires up to 25% less time than the state of the art and defines efficient data structures that yield nearly optimal complexity bounds.
...

References

SHOWING 1-3 OF 3 REFERENCES

Derivatives of Regular Expressions

TLDR
In this paper the notion of a derivative of a regular expression is introduced atld the properties of derivatives are discussed and this leads, in a very natural way, to the construction of a state diagram from a regularexpression containing any number of logical operators.

Representation of Events in Nerve Nets and Finite Automata

TLDR
This memorandum is devoted to an elementary exposition of the problems and of results obtained on the McCulloch-Pitts nerve net during investigations in August 1951.

IBM 7094 principles of operation. File No. 7094-01, Form A22-6703-1

  • IBM 7094 principles of operation. File No. 7094-01, Form A22-6703-1