# Composite Repetition-Aware Data Structures

@article{Belazzougui2015CompositeRD,
title={Composite Repetition-Aware Data Structures},
author={Djamal Belazzougui and Fabio Cunial and Travis Gagie and Nicola Prezza and Mathieu Raffinot},
journal={ArXiv},
year={2015},
volume={abs/1502.05937}
}
• Published 20 February 2015
• Computer Science
• ArXiv
In highly repetitive strings, like collections of genomes from the same species, distinct measures of repetition all grow sublinearly in the length of the text, and indexes targeted to such strings typically depend only on one of these measures. We describe two data structures whose size depends on multiple measures of repetition at once, and that provide competitive tradeoffs between the time for counting and reporting all the exact occurrences of a pattern, and the space taken by the…
• Computer Science
ArXiv
• 2016
This paper explores the practical advantages of combining data structures whose size depends on distinct measures of repetition, and describes a range of practical variants that combine RLBWT with the set of boundaries of the Lempel-Ziv 77 factors of a string, which take space proportional to the number of factors.
• Computer Science
CiE
• 2017
Practical data structures that support counting and locating all the exact occurrences of a pattern in a repetitive text are described, by combining the run-length encoded Burrows-Wheeler transform (RLBWT) with the boundaries of Lempel-Ziv 77 factors.
• Computer Science
J. ACM
• 2020
This article shows how to extend the Run-Length FM-index so that it can locate the occ occurrences efficiently (in O(occ log log n) time) within O(r) space, and outperforms the space-competitive alternatives by 1--2 orders of magnitude in time.
• Computer Science
SODA
• 2018
This paper shows how to extend the Run-Length FM-index so that it can locate the occurrences of a pattern efficiently within O(r) space (in loglogarithmic time each), and reaching optimal time $O(m+occ)$ within £O(r\log(n/r)$space, on a RAM machine of$w=\Omega(\log n)\$ bits.
The main goal is to develop practical and flexible succinct indexes to support pattern matching and document retrieval operations on repetitive string collections.
This survey describes the distinct compression paradigms that have been used to exploit repetitiveness, and the algorithmic techniques that provide direct access to the compressed strings.
• Computer Science
• 2018
This paper develops the first universal compressed self-index, that is, the first indexing data structure based on string attractors, which can be built on top of any dictionary-compressed text representation, and shows that the relation between indexing and compression is much deeper than what was previously thought.
• Computer Science
Theor. Comput. Sci.
• 2019
This paper presents an implementation of an hybrid index that combines the effectiveness of Lempel-Ziv factorization with a modular design, and is able to successfully index thousands of genomes in a commodity desktop, and it scales up to multi-terabyte collections, provided there is enough secondary memory.

## References

SHOWING 1-10 OF 21 REFERENCES

• Biology
J. Comput. Biol.
• 2010
New static and dynamic full-text indexes are developed that are able of capturing the fact that a collection is highly repetitive, and require space basically proportional to the length of one typical sequence plus the total number of edit operations.
• Computer Science
SPIRE
• 2008
It is shown that the state-of-the-art entropy-bound full-text self-indexes do not yet provide satisfactory space bounds for this specific task, and some new structures that use run-length encoding are engineer and empirical evidence that these structures are superior to the current structures are given.
• Computer Science
Algorithmica
• 2010
Stronger Lempel-Ziv based indices (LZ-indices) are presented, improving the overall performance of the original LZ-index and achieving indices requiring (2+ε)uHk(T)+o(ulog σ) bits of space, for any constant ε>0, which makes them the smallest existing LZ -indices.
• Computer Science
• 1996
The rst sublinear-size index structure is presented, based on Lempel-Ziv parsing of the text and has size linear in N, the size of the Lempel -Ziv parse.
• Computer Science
LATIN
• 2014
This paper shows how, given a string S [1..n] whose LZ77 parse consists of z phrases, one can store a self-index for S in $$\mathcal{O}({z \log (n / z)})$$ space such that later it can be extracted in time.
• Computer Science
ESA
• 2013
P succinct and compact representations of the bidirectional bwt of a string s ∈ Σ* which provide increasing navigation power and a number of space-time tradeoffs are described, resulting in near-linear time algorithms for many sequence analysis problems for the first time in succinct space.