# Fast Label Extraction in the CDAWG

@article{Belazzougui2017FastLE,
title={Fast Label Extraction in the CDAWG},
author={Djamal Belazzougui and Fabio Cunial},
journal={ArXiv},
year={2017},
volume={abs/1707.08197}
}
• Published 25 July 2017
• Computer Science
• ArXiv
The compact directed acyclic word graph (CDAWG) of a string $T$ of length $n$ takes space proportional just to the number $e$ of right extensions of the maximal repeats of $T$, and it is thus an appealing index for highly repetitive datasets, like collections of genomes from similar species, in which $e$ grows significantly more slowly than $n$. We reduce from $O(m\log{\log{n}})$ to $O(m)$ the time needed to count the number of occurrences of a pattern of length $m$, using an existing data…

### Online Algorithms for Constructing Linear-size Suffix Trie

• Computer Science
CPM
• 2019
Two types of online algorithms which `directly' construct the LST, from right to left, and from left to right, without constructing the suffix tree as an intermediate structure are presented.

### Fully-functional bidirectional Burrows-Wheeler indexes

• Computer Science
ArXiv
• 2019
An index that supports bidirectional addition and removal in $O(\log{\log{|T|}})$ time, and that occupies a number of words proportional to the number of left and right extensions of the maximal repeats of $T$.

### Optimal-Time Text Indexing in BWT-runs Bounded Space

• Computer Science
SODA
• 2018

### Linear-Size CDAWG: New Repetition-Aware Indexing and Grammar Compression

• Computer Science
SPIRE
• 2017
In this paper, we propose a novel approach to combine compact directed acyclic word graphs (CDAWGs) and grammar-based compression. This leads us to an efficient self-index, called Linear-size CDAWGs

### Composite Repetition-Aware Data Structures

• Computer Science
CPM
• 2015
Two data structures are described whose size depends on multiple measures of repetition at once, and that provide competitive tradeoffs between the time for counting and reporting all the exact occurrences of a pattern, and the space taken by the structure.

### Succinct Suffix Arrays based on Run-Length Encoding

• Computer Science
Nord. J. Comput.
• 2005
A new self-index, called RLFM index for "run-length FM-index", that counts the occurrences of P in T in O(m) time when the alphabet size is σ = O(polylog(n), and it is shown that the RL FM index can be enhanced to locate occurrences in the text and display text substrings in time independent of σ.

### Fully compressed suffix trees

• Computer Science
TALG
• 2011
This article introduces the first compressed suffix tree representation that requires only sublinear space on top of the compressed text size, and supports a wide set of navigational operations in almost logarithmic time.

### Linear-size suffix tries

• Computer Science
Theor. Comput. Sci.
• 2016

### Fast Fully-Compressed Suffix Trees

• Computer Science
2014 Data Compression Conference
• 2014
This work significantly accelerates the fully-compressed suffix tree representation (FCST), and the resulting FCST variant becomes very attractive in terms of space and time, and a promising alternative in practice.

### Finding Level-Ancestors in Trees

• Computer Science
J. Comput. Syst. Sci.
• 1994

### Storage and Retrieval of Highly Repetitive Sequence Collections

• Biology
J. Comput. Biol.
• 2010
New static and dynamic full-text indexes are developed that are able of capturing the fact that a collection is highly repetitive, and require space basically proportional to the length of one typical sequence plus the total number of edit operations.

### Run-Length Compressed Indexes Are Superior for Highly Repetitive Sequence Collections

• Computer Science
SPIRE
• 2008
It is shown that the state-of-the-art entropy-bound full-text self-indexes do not yet provide satisfactory space bounds for this specific task, and some new structures that use run-length encoding are engineer and empirical evidence that these structures are superior to the current structures are given.