A guided tour to approximate string matching

@article{Navarro2001AGT,
  title={A guided tour to approximate string matching},
  author={Gonzalo Navarro},
  journal={ACM Comput. Surv.},
  year={2001},
  volume={33},
  pages={31-88}
}
  • G. Navarro
  • Published 1 March 2001
  • Computer Science
  • ACM Comput. Surv.
We survey the current techniques to cope with the problem of string matching that allows errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms and their complexities. We present a number of experiments to… 
A Hybrid Indexing Method for Approximate String Matching
TLDR
A new indexing method based on a suffix array combined with a partitioning of the pattern that can outperform by far all the existing alternatives for indexed approximate searching is presented.
A Preprocessing for Approximate String Matching
TLDR
This paper proposes an algorithm for the problem of approximate string matching that solves the match-count problem as a preprocessing and makes clear the relation between the solutions of the two problems.
Fast Algorithms for Top-k Approximate String Matching
TLDR
This paper presents a general q-gram based framework and proposes two efficient algorithms based on the strategies introduced that show a superior performance in the efficient top-k similar string matching problem.
Indexing Structures for Approximate String Matching
In this paper we give the first, to our knowledge, structures and corresponding algorithms for approximate indexing, by considering the Hamming distance, having the following properties. i) Their
A Novel Algorithm for String Matching with Mismatches
TLDR
An online algorithm to deal with pattern matching in strings based on the frequencies of individual characters in the pattern and the text that consumes minimal space in the form of simple arrays, which reduces the cost overhead to maintain the complex data structures such as suffix trees or automaton.
Survey of Spatial Approximate String Search
  • B. Tech
  • Economics, Computer Science
  • 2013
TLDR
This survey surveys the current techniques to cope with the problem of string matching that allows errors, and focuses on spatial string searching and mostly on edit distance, its statistical behavior, its history and current developments, and the central ideas of the techniques and their difficulties.
Efficient Merging and Filtering Algorithms for Approximate String Searches
TLDR
This paper develops several algorithms that can greatly improve the performance of existing algorithms and studies how to integrate existing filtering techniques with these algorithms, and shows that they should be used together judiciously.
String Matching with Metric Trees Using an Approximate Distance
TLDR
This paper investigates the performance of metric trees, namely the M-tree, when they are extended using a cheap approximate distance function as a filter to quickly discard irrelevant strings, and shows an improvement in performance up to 90% with respect to the basic case.
A Study on Similar String Matching
TLDR
This paper presents a general q-gram based framework and studies the efficient strategies introduced experimentally on real data sets of the efficient top- k similar string matching problem.
Average-optimal single and multiple approximate string matching
We present a new algorithm for multiple approximate string matching. It is based on reading backwards enough l-grams from text windows so as to prove that no occurrence can contain the part of the
...
...

References

SHOWING 1-10 OF 239 REFERENCES
A New Indexing Method for Approximate String Matching
TLDR
A new indexing method based on a suffix tree combined with a partitioning of the pattern that outperforms by far all other algorithms for indexed approximate searching, and it is shown how this index can be implemented using much less space.
A Comparison of Approximate String Matching Algorithms
TLDR
It turns out that none of the algorithms is the best for all values of the problem parameters, and the speed differences between the methods can be considerable.
Block Edit Models for Approximate String Matching
Improving an Algorithm for Approximate Pattern Matching
TLDR
This work shows an excellent example of a complex and theoretical analysis of algorithms used for design and for practical algorithm engineering, instead of the common practice of first designing an algorithm and then analyzing it.
Approximate String Matching
Approximate matching of strings is reviewed with the aim of surveying techniques suitable for finding an item in a database when there may be a spelling mistake or other error in the keyword. The
Theoretical and Empirical Comparisons of Approximate String Matching Algorithms
TLDR
A probabilistic analysis of the DP table is given in order to prove that the expected running time of the algorithm (as well as an earlier “cut-off” algorithm due to Ukkonen) is O(kn) for random text.
A Practical q -Gram Index for Text Retrieval Allowing Errors
TLDR
An indexing technique for approximate text searching, which is practical and powerful, and especially optimized for natural language text, and able to retrieve any string that approximately matches the search pattern, not only words.
New and faster filters for multiple approximate string matching
TLDR
The three new algorithms for on‐line multiple string matching allowing errors are the first to allow more errors, and are faster than previous work for a moderate number of patterns (e.g. less than 50–100 on English text, depending on the pattern length).
Multiple Approximate String Matching
TLDR
Two new algorithms for on-line multiple approximate string matching are presented, extensions of previous algorithms that search for a single pattern that partitions the pattern in sub-patterns that are searched with no errors, with a fast exact multipattern search algorithm.
Incremental String Comparison
TLDR
This paper considers the following incremental version of comparing two sequences A and B to determine their longest common subsequence (LCS) or the edit distance between them, and obtains O(nk) algorithms for the longest prefix approximate match problem, the approximate overlap problem, and cyclic string comparison.
...
...