# Succinct Suffix Arrays based on Run-Length Encoding

@article{Mkinen2005SuccinctSA,
title={Succinct Suffix Arrays based on Run-Length Encoding},
author={Veli M{\"a}kinen and Gonzalo Navarro},
journal={Nord. J. Comput.},
year={2005},
volume={12},
pages={40-66}
}
• Published 1 March 2005
• Computer Science
• Nord. J. Comput.
A succinet full-text self-index is a data structure built on a text T = t1t2...tn, which takes little space (ideally close to that of the compressed text), permits efficient search for the occurrences of a pattern P = p1p2...pm in T, and is able to reproduce any text substring, so the self-index replaces the text.Several remarkable self-indexes have been developed in recent years. Many of those take space proportional to nH0 or nHk bits, where Hk is the kth order empirical entropy of T. The…
196 Citations
• Computer Science
J. ACM
• 2020
This article shows how to extend the Run-Length FM-index so that it can locate the occ occurrences efficiently (in O(occ log log n) time) within O(r) space, and outperforms the space-competitive alternatives by 1--2 orders of magnitude in time.
• Computer Science
Algorithmica
• 2010
Stronger Lempel-Ziv based indices (LZ-indices) are presented, improving the overall performance of the original LZ-index and achieving indices requiring (2+ε)uHk(T)+o(ulog σ) bits of space, for any constant ε>0, which makes them the smallest existing LZ -indices.
• Computer Science
CPM
• 2006
Two different approaches to reduce the space requirement of LZ-index are presented and it is shown how the space can be squeezed to (1 + e)uHk(T) + o(ulogσ) to obtain a structure with O(m2) average search time for $m \geqslant 2\log_\sigma{u}$.
• Computer Science
ISAAC
• 2005
This paper presents a practical space-efficient algorithm to construct LZ-index, requiring (4+e)uHk+o(u) bits of space, for any constant 0<e<1, and O(σu) time, being σ the alphabet size.
• Computer Science
CPM
• 2022
This paper focuses on the most basic compressed representation of a string, run-length encoding ( RLE), which represents each maximal run of the same characters a by a p where p is the length of the run.
This thesis proposes a deep study of compressed full-text self-indexes based on the Ziv-Lempel compression algorithm, focusing on the Navarro’s LZ-index, which has many interesting properties: fast full- text searching and text recovery; using little space for construction and operation; allowing insertion and deletion of text; providing a range of space/time trade-offs; and efficient construction and search in secondary memory.
• Computer Science
SODA
• 2018
This paper shows how to extend the Run-Length FM-index so that it can locate the occurrences of a pattern efficiently within O(r) space (in loglogarithmic time each), and reaching optimal time $O(m+occ)$ within £O(r\log(n/r)$space, on a RAM machine of$w=\Omega(\log n)\$ bits.
• Computer Science
• 2015
This work proposes a variant of the FCST that improves pattern matching both in theory and in practice using a blind search approach and shows that the implementation outperforms the previous prototype in both space consumption and query/construction time.
• Computer Science
• 2008
New static/dynamic full-text self-indexes based on the run-length encoding whose space-requirements are much less dependent on N are developed, and can be plugged into a recent dynamic fully-compressed suffix tree using an additionalO((N/δ)log N) bits of space for any δ = polylog(N), and retaining the poly log(N) time slowdown on operations.

## References

SHOWING 1-10 OF 56 REFERENCES

• Computer Science
SPIRE
• 2004
The main problem of the FM-index is that its space usage depends exponentially on σ, that is, 5H k n + σ σ o(n) for any k, H k being the k-th order entropy of T.
• Computer Science
• 2004
The FM-index is shown how the same ideas can be used to obtain an index needing O(Hkn) bits of space, with the constant factor depending only logarithmically on σ.
• Computer Science
CPM
• 2004
It is shown that the occ occurrence positions of a pattern of length m in a text of length n can be reported in O((m+occ)log n) time using the CCSA, whose representation needs O(n(1+H k log n) bits for any k, H k being the k-th order empirical entropy of the text.
• Computer Science
STOC '00
• 2000
An index structure is constructed that occupies only O(n) bits and compares favorably with inverted lists in space and achieves optimal O(m/log n) search time for sufficiently large m = ~(log a+~ n).
• Computer Science
ISAAC
• 2004
The most remarkable one is that the CSA does not need any complicated sub-linear structures based on the four-Russians technique, and it is shown that sampling and compression are enough to achieve O(mlog n) query time using less space than the original structure.
A compressed text database based on the compressed suffix array is proposed, and the relationship with the opportunistic data structure of Ferragina and Manzini is shown.
• Computer Science
J. Algorithms
• 1998
This work gives a representation of a suffix tree that uses $$n \lg n + O(n)$$ bits of space and supports searching for a pattern in the given text in O(m) time and develops a structure that uses a suffix array and an additional o(n) bits.
• Computer Science
SODA '03
• 2003
We present a novel implementation of compressed suffix arrays exhibiting new tradeoffs between search time and space occupancy for a given text (or sequence) of <i>n</i> symbols over an alphabet σ,
Two succinct data structures are introduced for storing the information of lcp, the longest common prefix, between suffixes in the suffix array, and an improvement in the compressed suffix array which supports linear time counting queries for any pattern.