Succinct Suffix Arrays based on Run-Length Encoding
@article{Mkinen2005SuccinctSA, title={Succinct Suffix Arrays based on Run-Length Encoding}, author={Veli M{\"a}kinen and Gonzalo Navarro}, journal={Nord. J. Comput.}, year={2005}, volume={12}, pages={40-66} }
A succinet full-text self-index is a data structure built on a text T = t1t2...tn, which takes little space (ideally close to that of the compressed text), permits efficient search for the occurrences of a pattern P = p1p2...pm in T, and is able to reproduce any text substring, so the self-index replaces the text.Several remarkable self-indexes have been developed in recent years. Many of those take space proportional to nH0 or nHk bits, where Hk is the kth order empirical entropy of T. The…
196 Citations
Fully Functional Suffix Trees and Optimal Text Searching in BWT-Runs Bounded Space
- Computer ScienceJ. ACM
- 2020
This article shows how to extend the Run-Length FM-index so that it can locate the occ occurrences efficiently (in O(occ log log n) time) within O(r) space, and outperforms the space-competitive alternatives by 1--2 orders of magnitude in time.
Stronger Lempel-Ziv Based Compressed Text Indexing
- Computer ScienceAlgorithmica
- 2010
Stronger Lempel-Ziv based indices (LZ-indices) are presented, improving the overall performance of the original LZ-index and achieving indices requiring (2+ε)uHk(T)+o(ulog σ) bits of space, for any constant ε>0, which makes them the smallest existing LZ -indices.
Reducing the Space Requirement of LZ-Index
- Computer ScienceCPM
- 2006
Two different approaches to reduce the space requirement of LZ-index are presented and it is shown how the space can be squeezed to (1 + e)uHk(T) + o(ulogσ) to obtain a structure with O(m2) average search time for $m \geqslant 2\log_\sigma{u}$.
Space-efficient construction of Lempel-Ziv compressed text indexes
- Computer ScienceInf. Comput.
- 2011
Space-Efficient Construction of LZ-Index
- Computer ScienceISAAC
- 2005
This paper presents a practical space-efficient algorithm to construct LZ-index, requiring (4+e)uHk+o(u) bits of space, for any constant 0<e<1, and O(σu) time, being σ the alphabet size.
Minimal Absent Words on Run-Length Encoded Strings
- Computer ScienceCPM
- 2022
This paper focuses on the most basic compressed representation of a string, run-length encoding ( RLE), which represents each maximal run of the same characters a by a p where p is the length of the run.
Ziv-Lempel Compressed Full-Text Self-Indexes
- Computer Science
This thesis proposes a deep study of compressed full-text self-indexes based on the Ziv-Lempel compression algorithm, focusing on the Navarro’s LZ-index, which has many interesting properties: fast full- text searching and text recovery; using little space for construction and operation; allowing insertion and deletion of text; providing a range of space/time trade-offs; and efficient construction and search in secondary memory.
Optimal-Time Text Indexing in BWT-runs Bounded Space
- Computer ScienceSODA
- 2018
This paper shows how to extend the Run-Length FM-index so that it can locate the occurrences of a pattern efficiently within O(r) space (in loglogarithmic time each), and reaching optimal time $O(m+occ)$ within £O(r\log(n/r)$ space, on a RAM machine of $w=\Omega(\log n)$ bits.
Engineering Fully-Compressed Suffix Trees
- Computer Science
- 2015
This work proposes a variant of the FCST that improves pattern matching both in theory and in practice using a blind search approach and shows that the implementation outperforms the previous prototype in both space consumption and query/construction time.
Run-Length Compressed Indexes for Repetitive Sequence Collections
- Computer Science
- 2008
New static/dynamic full-text self-indexes based on the run-length encoding whose space-requirements are much less dependent on N are developed, and can be plugged into a recent dynamic fully-compressed suffix tree using an additionalO((N/δ)log N) bits of space for any δ = polylog(N), and retaining the poly log(N) time slowdown on operations.
References
SHOWING 1-10 OF 56 REFERENCES
First Huffman, Then Burrows-Wheeler: A Simple Alphabet-Independent FM-Index
- Computer ScienceSPIRE
- 2004
The main problem of the FM-index is that its space usage depends exponentially on σ, that is, 5H k n + σ σ o(n) for any k, H k being the k-th order entropy of T.
Run-Length FM-index
- Computer Science
- 2004
The FM-index is shown how the same ideas can be used to obtain an index needing O(Hkn) bits of space, with the constant factor depending only logarithmically on σ.
Compressed Compact Suffix Arrays
- Computer ScienceCPM
- 2004
It is shown that the occ occurrence positions of a pattern of length m in a text of length n can be reported in O((m+occ)log n) time using the CCSA, whose representation needs O(n(1+H k log n) bits for any k, H k being the k-th order empirical entropy of the text.
Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract)
- Computer ScienceSTOC '00
- 2000
An index structure is constructed that occupies only O(n) bits and compares favorably with inverted lists in space and achieves optimal O(m/log n) search time for sufficiently large m = ~(log a+~ n).
Advantages of Backward Searching - Efficient Secondary Memory and Distributed Implementation of Compressed Suffix Arrays
- Computer ScienceISAAC
- 2004
The most remarkable one is that the CSA does not need any complicated sub-linear structures based on the four-Russians technique, and it is shown that sampling and compression are enough to achieve O(mlog n) query time using less space than the original structure.
Compressed Text Databases with Efficient Query Algorithms Based on the Compressed Suffix Array
- Computer ScienceISAAC
- 2000
A compressed text database based on the compressed suffix array is proposed, and the relationship with the opportunistic data structure of Ferragina and Manzini is shown.
Space Efficient Suffix Trees
- Computer ScienceJ. Algorithms
- 1998
This work gives a representation of a suffix tree that uses \(n \lg n + O(n)\) bits of space and supports searching for a pattern in the given text in O(m) time and develops a structure that uses a suffix array and an additional o(n) bits.
High-order entropy-compressed text indexes
- Computer ScienceSODA '03
- 2003
We present a novel implementation of compressed suffix arrays exhibiting new tradeoffs between search time and space occupancy for a given text (or sequence) of <i>n</i> symbols over an alphabet σ,…
Succinct representations of lcp information and improvements in the compressed suffix arrays
- Computer ScienceSODA '02
- 2002
Two succinct data structures are introduced for storing the information of lcp, the longest common prefix, between suffixes in the suffix array, and an improvement in the compressed suffix array which supports linear time counting queries for any pattern.