# A new approach to text searching

@article{BaezaYates1992ANA,
title={A new approach to text searching},
author={Ricardo Baeza-Yates and Gaston H. Gonnet},
journal={Communications of The ACM},
year={1992},
volume={35},
pages={74-82}
}
• Published 1 October 1992
• Computer Science
• Communications of The ACM
We introduce a family of simple and fast algorithms for solving the classical string matching problem, string matching with don't care symbols and complement symbols, and multiple patterns. In addition we solve the same problems allowing up to k mismatches. Among the features of these algorithms are that they are real time algorithms, they don't need to buffer the input, and they are suitable to be implemented in hardware.
407 Citations

## Figures and Tables from this paper

Fast and Practical Approximate String Matching
• Computer Science, Physics
CPM
• 1992
Fast Regular Expression Search
• Computer Science
WAE
• 1999
A new algorithm to search regular expressions is presented, which is able to skip text characters, and is fast, the fastest one in many cases of interest.
Very Fast and Simple Approximate String Matching
• Computer Science
Inf. Process. Lett.
• 1999
Approximate String Matching with SIMD
• Computer Science
• 2021
We consider the $k$ mismatches version of approximate string matching for a single pattern and multiple patterns. For these problems, we present new algorithms utilizing the single instruction
A comparison of the performance of four exact string matching algorithms
• Computer Science
2007 IEEE International Conference on Electro/Information Technology
• 2007
There are numerous exact string matching algorithms that have similar performance characteristics. Which algorithm is best depends on the length of the pattern being searched for, the number of
Parallel Architecture for Flexible Approximate Text Searching
• Computer Science
• 2003
A processor array design for flexible approximate string matching is presented which consists of two phases, i.e. preprocessing and searching, and a parallel architecture is derived from the computational schedule of the searching phase.
A Comparison of Approximate String Matching Algorithms
• Computer Science
Softw. Pract. Exp.
• 1996
It turns out that none of the algorithms is the best for all values of the problem parameters, and the speed differences between the methods can be considerable.
A new string-pattern matching algorithm using partitioning and hashing efficiently
A new string-pattern matching algorithm that partitions the text into segments of the input pattern length and searches for pattern occurrences using a simple hashing scheme, providing a conceptually simpler way to search for patterns.
Fast Multiple String Matching Using Streaming SIMD Extensions Technology
• Computer Science
SPIRE
• 2012
A filter based exact multiple string matching algorithm, which benefits from Intel's SSE (streaming SIMD extensions) technology for searching long strings, which outperforms other solutions, which are known to be among the fastest in practice.

## References

SHOWING 1-10 OF 32 REFERENCES
A new approach to text searching
• Computer Science
SIGIR '89
• 1989
We introduce a family of simple and fast algorithms for solving the classical string matching problem, string matching with don't care symbols and complement symbols, and multiple patterns. In
Fast and Practical Approximate String Matching
• Computer Science, Physics
CPM
• 1992
Practical fast searching in strings
It is discovered that a method developed by Boyer and Moore can outperform even special‐purpose search instructions that may be built into the computer hardware for very short substrings.
Fast String Matching with Mismatches
• Computer Science, Physics
Inf. Comput.
• 1994
Abstract We describe and analyze three simple and fast algorithms on the average for solving the problem of string matching with a bounded number of mismatches. These are the naive algorithm, an
Improved string searching
It is shown that it is possible to improve the average time of the Boyer‐Moore string matching algorithm using more space by applying a transformation that virtually increases the size of the alphabet in use.
Fast Pattern Matching in Strings
• Computer Science
SIAM J. Comput.
• 1977
An algorithm is presented which finds all occurrences of one given string within another, in running time proportional to the sum of the lengths of the strings, showing that the set of concatenations of even palindromes, i.e., the language $\{\alpha \alpha ^R\}^*$, can be recognized in linear time.
Efficient String Matching with Don’t-Care Patterns
This paper considers the extension of the methods of Aho and Corasick to deal with patterns involving more expressive descriptions, such as don’t-care (wild-card) symbols, complements, etc.
Experiments with a very fast substring search algorithm
The performances of similar, but language‐independent, algorithms are examined and results comparable with language‐based algorithms can be achieved with an adaptive technique.
Efficient Randomized Pattern-Matching Algorithms
• Computer Science, Mathematics
IBM J. Res. Dev.
• 1987
We present randomized algorithms to solve the following string-matching problem and some of its generalizations: Given a string X of length n (the pattern) and a string Y (the text), find the first
Efficient String Matching with k Mismatches
• Computer Science
Theor. Comput. Sci.
• 1986