Figures from this paper
39 Citations
Practical and Flexible Indexes on Repetitive String Collections
- Computer Science
- 2019
The main goal is to develop practical and flexible succinct indexes to support pattern matching and document retrieval operations on repetitive string collections.
On Locating Paths in Compressed Cardinal Trees
- Computer ScienceArXiv
- 2020
This paper shows for the first time how to support the powerful locate queries on compressed trees, and proposes suitable generalizations of run-length BWT, high-order entropy, and string attractors to cardinal trees (tries).
Subpath Queries on Compressed Graphs: a Survey
- Computer ScienceAlgorithms
- 2021
This survey is a gentle introduction to the main landmarks of the fascinating journey that took us from suffix trees to today’s compressed indexes for labeled graphs and regular languages.
Indexing Highly Repetitive String Collections, Part I: Repetitiveness Measures
- Computer Science
- 2020
This survey describes the distinct compression paradigms that have been used to exploit repetitiveness, and the algorithmic techniques that provide direct access to the compressed strings.
Indexing Highly Repetitive String Collections, Part II: Compressed Indexes
- Computer Science
- 2020
This survey covers the fundamental algorithmic ideas and data structures that form the base of all the existing indexes, and the various concrete structures that have been proposed, comparing them both in theoretical and practical aspects, and uncovering some new combinations.
Block Tree based Universal Self-Index for Repetitive Text Collections
- Computer Science
- 2020
Being able to manipulate the text within compressed space, with a compression related to its repetitiveness has a critical importance in many areas of study such as Bioinformatics, Information Retrieval, Data Mining, among others.
Towards a Definitive Compressibility Measure for Repetitive Sequences
- Computer ScienceIEEE Transactions on Information Theory
- 2022
This paper argues that δ better captures the compressibility of repetitive strings, and studies an even smaller measure, δ ≤ γ, which can be computed in linear time, is monotone, and allows encoding every string in O ( δ log nδ ) space.
Optimal-Time Queries on BWT-Runs Compressed Indexes
- Computer ScienceICALP
- 2021
The first compressed index on RLBWT is presented, which is called R-index-f, that supports various queries including locate, count, extract queries, decompression and prefix search in the optimal time with smaller working space of $O(r)$ words for small alphabets in this paper.
Fully Functional Suffix Trees and Optimal Text Searching in BWT-Runs Bounded Space
- Computer ScienceJ. ACM
- 2020
This article shows how to extend the Run-Length FM-index so that it can locate the occ occurrences efficiently (in O(occ log log n) time) within O(r) space, and outperforms the space-competitive alternatives by 1--2 orders of magnitude in time.
Indexing Highly Repetitive String Collections, Part II
- Computer ScienceACM Comput. Surv.
- 2022
This survey covers the fundamental algorithmic ideas and data structures that form the base of all the existing indexes, and the various concrete structures that have been proposed, comparing them both in theoretical and practical aspects, and uncovering some new combinations.
References
SHOWING 1-10 OF 55 REFERENCES
Self-Indexed Grammar-Based Compression
- Computer ScienceFundam. Informaticae
- 2011
The first grammar-based self-index is introduced, a representation of SLPs that takes 2n log 2 n(1 + o(1)) bits and efficiently supports more operations than a plain array of rules and a representation for binary relations with labels supporting various extended queries.
Optimal-Time Text Indexing in BWT-runs Bounded Space
- Computer ScienceSODA
- 2018
This paper shows how to extend the Run-Length FM-index so that it can locate the occurrences of a pattern efficiently within O(r) space (in loglogarithmic time each), and reaching optimal time $O(m+occ)$ within £O(r\log(n/r)$ space, on a RAM machine of $w=\Omega(\log n)$ bits.
Composite Repetition-Aware Data Structures
- Computer ScienceCPM
- 2015
Two data structures are described whose size depends on multiple measures of repetition at once, and that provide competitive tradeoffs between the time for counting and reporting all the exact occurrences of a pattern, and the space taken by the structure.
Collage system: a unifying framework for compressed pattern matching
- Computer ScienceTheor. Comput. Sci.
- 2003
Sparse Suffix Tree Construction in Optimal Time and Space
- Computer ScienceSODA
- 2017
A linear-time Monte Carlo algorithm is designed for sparse suffix tree construction, and this algorithm is complemented with a deterministic verification procedure that improves upon the bound of O(n log b) obtained by I et al.
Data compression via textual substitution
- Computer ScienceJACM
- 1982
A general model for data compression which includes most data compression systems in the fiterature as special cases is presented and trade-offs between different varieties of macro schemes, exact lower bounds on the amount of compression obtainable, and the complexity of encoding and decoding are discussed.
Time-space trade-offs for Lempel-Ziv compressed indexing
- Computer ScienceTheor. Comput. Sci.
- 2017
A Faster Grammar-Based Self-index
- Computer ScienceLATA
- 2012
This paper shows how, given a balanced straight-line program for a string S[1..n] whose LZ77 parse consists of z phrases, one can add O(z log log z) words and obtain a compressed self-index for S such that it can list the occ occurrences of P in S in O(m2 + (m + occ) log log n) time.