Efficient string matching

  title={Efficient string matching},
  author={Alfred V. Aho and Margaret John Corasick},
  journal={Communications of the ACM},
  pages={333 - 340}
  • A. Aho, M. J. Corasick
  • Published 1 June 1975
  • Economics, Computer Science, Education
  • Communications of the ACM
This paper describes a simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text. The algorithm consists of constructing a finite state pattern matching machine from the keywords and then using the pattern matching machine to process the text string in a single pass. Construction of the pattern matching machine takes time proportional to the sum of the lengths of the keywords. The number of state transitions made by the pattern matching… 

Figures from this paper

A Method for Improving String Pattern Matching Machines
This correspondence describes an efficient string pattern matching machine to locate all occurrences of any of a finite number of keywords and phrases in an arbitrary text string. Some conditions are
An Efficient Implementation of Static String Pattern Matching Machines
  • J. Aoe
  • Computer Science
    IEEE Trans. Software Eng.
  • 1989
A technique for implementing a static transition table of a string pattern matching machine which locates all occurrences of a finite number of keywords in a string is described and can be speeded up by a finite straight program without loops.
An efficient implemenatation of string pattern matching machines for a finite number of keywords
  • J. Aoe
  • Computer Science, Economics
  • 1989
A method of implementing a static transition table of a string pattern matching machine to locate all occurrences of a finite number of keywords in a text string by combining the fast access of an array representation with the compactness of a list structure.
A string pattern - matching algorithm
A Fast String-Searching Algorithm for Multiple Patterns
Approximate string matching: a simpler faster algorithm
We give two algorithms for finding all approximate matches of a pattern in a text, where the edit distance between the pattern and the matching text substring is at most k. The first algorithm, which
Approximate string matching with suffix automata
A newO(kn) algorithm for approximate string matching problem, wheren is the length of the text, based on the suffix automaton with failure transitions and on the diagonalwise monotonicity of the edit distance table is given.
An Efficient Multi-Attribute Pattern Matching Machine
The proposed algorithm enables us to match set representations containing multiple attributes to locate all occurrences of any of a nite number of the sequence of rule structures (called matching rules) in a sequence of input structures.
Improved Approximate Multiple Pattern String Matching using Consecutive Q Grams of Pattern
To improve the efficiency of basic Shift OR algorithm by reducing the number of false matches that is detected along with the correct matches by the algorithm, proposed Shift OR with consecutive q grams has been implemented.


Fast Pattern Matching in Strings
An algorithm is presented which finds all occurrences of one given string within another, in running time proportional to the sum of the lengths of the strings, showing that the set of concatenations of even palindromes, i.e., the language $\{\alpha \alpha ^R\}^*$, can be recognized in linear time.
By exploiting the formal similarity of string-matching with integer multiplication, a new algorithm has been obtained with a running time which is only slightly worse than linear.
Programming Techniques: Regular expression search algorithm
A method for locating specific character strings embedded in character text is described and an implementation of this method in the form of a compiler is discussed. The compiler accepts a regular
Automatic generation of efficient lexical processors using finite state techniques
The practical application of the theory of finite-state automata to automatically generate lexical processors is dealt with in this tutorial article by the use of the AED RWORD system, developed at
The Design and Analysis of Computer Algorithms
This text introduces the basic data structures and programming techniques often used in efficient algorithms, and covers use of lists, push-down stacks, queues, trees, and graphs.
Implementation of the substring test by hashing
Tradeoff curves are developed to show minimal cost of file usage by grouping various partially combined indices under conditions offile usage with different fractions of retrieval and update.
Microtext: the design of a microprogrammed finite state search machine for full-text retrieval
The Microtext system represents a new approach to the design and implementation of a full-text retrieval system that integrates hardware, firmware, and software components in an attempt to provide a solution to the problems involved in processing large files of unformatted textual data.
Derivatives of Regular Expressions
In this paper the notion of a derivative of a regular expression is introduced atld the properties of derivatives are discussed and this leads, in a very natural way, to the construction of a state diagram from a regularexpression containing any number of logical operators.
A system for typesetting mathematics
The design and implementation of a system for typesetting mathematics, designed to be easy to learn and to use by people who know neither mathematics nor typesetting, is described.
Regular Expressions and State Graphs for Automata
Algorithms are presented for 1) converting a state graph describing the behavior of an automaton to a regular expression describing the behavior of the same automaton (section 2), and 2) for