A new approach to text searching

@article{BaezaYates1992ANA,
  title={A new approach to text searching},
  author={Ricardo Baeza-Yates and Gaston H. Gonnet},
  journal={Communications of The ACM},
  year={1992},
  volume={35},
  pages={74-82}
}
We introduce a family of simple and fast algorithms for solving the classical string matching problem, string matching with don't care symbols and complement symbols, and multiple patterns. In addition we solve the same problems allowing up to k mismatches. Among the features of these algorithms are that they are real time algorithms, they don't need to buffer the input, and they are suitable to be implemented in hardware. 

Figures and Tables from this paper

Fast and Practical Approximate String Matching
Fast Regular Expression Search
TLDR
A new algorithm to search regular expressions is presented, which is able to skip text characters, and is fast, the fastest one in many cases of interest.
Approximate String Matching with SIMD
We consider the $k$ mismatches version of approximate string matching for a single pattern and multiple patterns. For these problems, we present new algorithms utilizing the single instruction
A comparison of the performance of four exact string matching algorithms
  • J. Leidig, C. Trefftz
  • Computer Science
    2007 IEEE International Conference on Electro/Information Technology
  • 2007
There are numerous exact string matching algorithms that have similar performance characteristics. Which algorithm is best depends on the length of the pattern being searched for, the number of
Parallel Architecture for Flexible Approximate Text Searching
TLDR
A processor array design for flexible approximate string matching is presented which consists of two phases, i.e. preprocessing and searching, and a parallel architecture is derived from the computational schedule of the searching phase.
A Comparison of Approximate String Matching Algorithms
TLDR
It turns out that none of the algorithms is the best for all values of the problem parameters, and the speed differences between the methods can be considerable.
A general compression algorithm that supports fast searching
A new string-pattern matching algorithm using partitioning and hashing efficiently
TLDR
A new string-pattern matching algorithm that partitions the text into segments of the input pattern length and searches for pattern occurrences using a simple hashing scheme, providing a conceptually simpler way to search for patterns.
Fast Multiple String Matching Using Streaming SIMD Extensions Technology
TLDR
A filter based exact multiple string matching algorithm, which benefits from Intel's SSE (streaming SIMD extensions) technology for searching long strings, which outperforms other solutions, which are known to be among the fastest in practice.
...
...

References

SHOWING 1-10 OF 32 REFERENCES
A new approach to text searching
We introduce a family of simple and fast algorithms for solving the classical string matching problem, string matching with don't care symbols and complement symbols, and multiple patterns. In
Practical fast searching in strings
TLDR
It is discovered that a method developed by Boyer and Moore can outperform even special‐purpose search instructions that may be built into the computer hardware for very short substrings.
Fast String Matching with Mismatches
Abstract We describe and analyze three simple and fast algorithms on the average for solving the problem of string matching with a bounded number of mismatches. These are the naive algorithm, an
Improved string searching
TLDR
It is shown that it is possible to improve the average time of the Boyer‐Moore string matching algorithm using more space by applying a transformation that virtually increases the size of the alphabet in use.
Fast Pattern Matching in Strings
TLDR
An algorithm is presented which finds all occurrences of one given string within another, in running time proportional to the sum of the lengths of the strings, showing that the set of concatenations of even palindromes, i.e., the language $\{\alpha \alpha ^R\}^*$, can be recognized in linear time.
Efficient String Matching with Don’t-Care Patterns
TLDR
This paper considers the extension of the methods of Aho and Corasick to deal with patterns involving more expressive descriptions, such as don’t-care (wild-card) symbols, complements, etc.
Experiments with a very fast substring search algorithm
TLDR
The performances of similar, but language‐independent, algorithms are examined and results comparable with language‐based algorithms can be achieved with an adaptive technique.
Efficient Randomized Pattern-Matching Algorithms
We present randomized algorithms to solve the following string-matching problem and some of its generalizations: Given a string X of length n (the pattern) and a string Y (the text), find the first
Efficient String Matching with k Mismatches
...
...