Linear work suffix array construction


Suffix trees and suffix arrays are widely used and largely interchangeable index structures on strings and sequences. Practitioners prefer suffix arrays due to their simplicity and space efficiency while theoreticians use suffix trees due to linear-time construction algorithms and more explicit structure. We narrow this gap between theory and practice with a simple linear-time construction algorithm for suffix arrays. The simplicity is demonstrated with a C&plus;&plus; implementation of 50 effective lines of code. The algorithm is called DC3, which stems from the central underlying concept of <i>difference cover</i>. This view leads to a generalized algorithm, DC, that allows a space-efficient implementation and, moreover, supports the choice of a space--time tradeoff. For any <i>v</i> &#8712; &lsqb;1,<i>&nradic;</i>&rsqb;, it runs in O(<i>vn</i>) time using O(<i>n</i>/<i>&vradic;</i>) space in addition to the input string and the suffix array. We also present variants of the algorithm for several parallel and hierarchical memory models of computation. The algorithms for BSP and EREW-PRAM models are asymptotically faster than all previous suffix tree or array construction algorithms.

DOI: 10.1145/1217856.1217858

Extracted Key Phrases

2 Figures and Tables

Citations per Year

283 Citations

Semantic Scholar estimates that this publication has 283 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@article{Krkkinen2006LinearWS, title={Linear work suffix array construction}, author={Juha K{\"a}rkk{\"a}inen and Peter Sanders and Stefan Burkhardt}, journal={J. ACM}, year={2006}, volume={53}, pages={918-936} }