Suffix arrays: a new method for on-line string searches

@article{Manber1990SuffixAA,
  title={Suffix arrays: a new method for on-line string searches},
  author={Udi Manber and Eugene W. Myers},
  journal={SIAM J. Comput.},
  year={1990},
  volume={22},
  pages={935-948}
}
A new and conceptually simple data structure, called a suffixarray, for on-line string searches is introduced in this paper. Constructing and querying suffixarrays is reduced to a sort and search paradigm that employs novel algorithms. The main advantage of suffixarrays over suffixtrees is that, in practice, they use three to five times less space. From a complexity standpoint, suffix arrays permit on-line string searches of the type, ‘‘Is W a substring of A?’’ to be answered in time O(P + log… 

Figures and Tables from this paper

Suffix Arrays for Multiple Strings: A Method for On-Line Multiple String Searches
TLDR
The generalized suffix array is applied to the problem of finding all occurrences of an m×m matrix as a submatrix in a larger n×n matrix (the text) and is the average-case fastest algorithm in its class.
Suffix Trays and Suffix Trists: Structures for Faster Text Indexing
TLDR
A suffix trist is suggested, a cross between a suffix tree and a suffix list, which supports queries in O(m+log|Σ|) time and the space and text update time of a suffix trists are the same as for the suffix tree or the suffix list.
Computing Longest Common Substrings Via Suffix Arrays
TLDR
This paper presents an alternative, remarkably simple approach to the above problem, which relies on the notion of suffix arrays, and seems to be quite practical.
Simple Linear Work Suffix Array Construction
TLDR
The skew algorithm for suffix array construction over integer alphabets that can be implemented to run in linear time using integer sorting as its only nontrivial subroutine is introduced.
A Fast Algorithm for Constructing Suffix Arrays for Fixed-Size Alphabets
TLDR
This paper presents a fast algorithm for constructing suffix arrays for the fixed-size alphabet that constructs suffix arrays faster than any other algorithms developed for integer or general alphabets when the size of the alphabet is fixed.
Dynamic Suffix Array with Sub-linear update time and Poly-logarithmic Lookup Time
TLDR
A data structure for maintaining a representation of the suffix array of a dynamic string which undergoes symbol substitutions, deletions, and insertions is presented and can be used to obtain sub-linear dynamic algorithms for several classical string problems for which efficient dynamic solutions were not previously known.
Suffix Trays and Suffix Trists: Structures for Faster Text Indexing
TLDR
A cross between a suffix tree and a suffix list (a dynamic variant of suffix array) to be called a suffix trist; it supports queries in O(m+log|Σ|) time and uses linear space.
Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract)
TLDR
An index structure is constructed that occupies only O(n) bits and compares favorably with inverted lists in space and achieves optimal O(m/log n) search time for sufficiently large m = ~(log a+~ n).
Contracted Suffix Trees: A Simple and Dynamic Text Indexing Data Structure
TLDR
A data structure called a sequence tree is modified, which was proposed by Coffman and Eve for hashing, and adapted to the new problem of finding the locations of all instances of a string P in a text T, and built in O (||P || + k ) time.
Position heaps: A simple and dynamic text indexing data structure
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 47 REFERENCES
Approximate string matching in sublinear expected time
  • W. I. Chang, E. Lawler
  • Computer Science
    Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science
  • 1990
TLDR
The authors have devised an algorithm that, for k<m/log n+O(1), runs in time O((n/m)k log n) on the average, and in the worst case their algorithm is O(nk, but it is still an improvement in that it is very practical and uses only O (n) space compared with O( n) or O(N/sup 2/).
Structural Properties of the String Statistics Problem
Fast Algorithms for Finding Nearest Common Ancestors
TLDR
An algorithm for a random access machine with uniform cost measure (and a bound of $\Omega (\log n)$ on the number of bits per word) that requires time per query and preprocessing time is presented, assuming that the collection of trees is static.
Parallel Log-time Construction of Suffix Trees
TLDR
A CReW Parnllel RAM algorithm is presented here which takes 0 (logn) time with n processors, n being the length of the input string, but only 0 (n iogn) cells need to be initialized.
On Finding Lowest Common Ancestors: Simplification and Parallelization
TLDR
A linear time and space preprocessing algorithm that enables us to answer each query in $O(1)$ time, as in Harel and Tarjan, which has the advantage of being simple and easily parallelizable.
A Space-Economical Suffix Tree Construction Algorithm
A new algorithm is presented for constructing auxiliary digital search trees to aid in exact-match substring searching. This algorithm has the same asymptotic running time bound as previously
Rapid identification of repeated patterns in strings, trees and arrays
TLDR
This paper describes a strategy for constructing efficient algorithms for solving two types of matching problems and develops explicit algorithms for these two problems applied to strings and arrays.
Fast Parallel and Serial Approximate String Matching
Improving Quicksort Performance with a Codewort Data Structure
TLDR
It is shown how the ordering of keys is preserved by an adequate choice of the code generator and how this can be applied to the quicksort algorithm.
...
1
2
3
4
5
...