# Linear Time Construction of Indexable Elastic Founder Graphs

@article{Rizzo2022LinearTC,
title={Linear Time Construction of Indexable Elastic Founder Graphs},
author={Nicola Rizzo and Veli M{\"a}kinen},
journal={ArXiv},
year={2022},
volume={abs/2201.06492}
}
• Published 17 January 2022
• Computer Science, Mathematics
• ArXiv
. Pattern matching on graphs has been widely studied lately due to its importance in genomics applications. Unfortunately, even the simplest problem of deciding if a string appears as a subpath of a graph admits a quadratic lower bound under the Orthogonal Vectors Hypothesis (Equi et al. ICALP 2019, SOFSEM 2021). To avoid this bottleneck, the research has shifted towards more speciﬁc graph classes, e.g. those induced from multiple sequence alignments ( MSA s). Consider segment-ing MSA [1 ..m, 1…
2 Citations

### Indexable Elastic Founder Graphs of Minimum Height

• Computer Science
CPM
• 2022
The indexable EFG minimizing the maximum prefix-aware height provides a lower bound for the original height: by exploiting exploiting suffix trees built from the MSA rows and the data structure answering weighted ancestor queries in constant time of Belazzougui et al.

### Algorithms and Complexity on Indexing Founder Graphs

• Mathematics
Algorithmica
• 2022
A compact pangenome representation based on an optimal segmentation concept that aims to reconstruct founder sequences from a multiple sequence alignment (MSA) is introduced and it is shown that unless the Strong Exponential Time Hypothesis fails, one cannot build an index on an elastic founder graph in polynomial time to support fast queries.

## References

SHOWING 1-10 OF 18 REFERENCES

### Linear Time Construction of Indexable Founder Block Graphs

• Computer Science, Mathematics
WABI
• 2020
A compact pangenome representation based on an optimal segmentation concept that aims to reconstruct founder sequences from a multiple sequence alignment (MSA) and derives a succinct index structure to support queries of arbitrary length in the paths of the graph.

### On the Complexity of String Matching for Graphs

• Computer Science, Mathematics
ICALP
• 2019
A conditional lower bound is proved stating that, for any constant > 0, an O(|E|1− m)-time algorithm for exact string matching in graphs, with node labels and patterns drawn from a binary alphabet, cannot be achieved unless the Strong Exponential Time Hypothesis (SETH) is false.

### Even Faster Elastic-Degenerate String Matching via Fast Matrix Multiplication

• Computer Science
ICALP
• 2019
By designing an appropriate reduction, it is shown that a combinatorial algorithm solving the EDSM problem in $\mathcal{O}(nm^{1.5-\epsilon} + N)$ time, for any $\ep silon>0$, refutes this conjecture.

### Indexing Graphs for Path Queries with Applications in Genome Research

• Computer Science
IEEE/ACM Transactions on Computational Biology and Bioinformatics
• 2014
The Burrows-Wheeler transform of strings to acyclic directed labeled graphs is extended to support path queries as an extension to substring searching, and several applications of such extensions are studied.

### Algorithms and Complexity on Indexing Elastic Founder Graphs

• Mathematics
ISAAC
• 2021
It is proved that even induced graphs induced from multiple sequence alignments are hard to index under OVH, and two subclasses that are easy to index are introduced: Elastic degenerate strings and elastic founder graphs.

### Linear time minimum segmentation enables scalable founder reconstruction

• Computer Science
Algorithms for Molecular Biology
• 2019
An O(mn) time (i.e. linear time in the input size) algorithm is given to solve the minimum segmentation problem for founder reconstruction, improving over an earlier $$O(mn^2)$$O (mn2).

### GraphTyper2 enables population-scale genotyping of structural variation using pangenome graphs

• Biology
Nature Communications
• 2019
GraphTyper2, which uses pangenome graphs to genotype structural variants using short-reads and can be applied in large-scale sequencing studies, is presented.

### Succinct static data structures

This thesis investigates the problem of data optimization for some fundamental static data types, concentrating on linked data structures such as trees, and problems of finding a minimal representation for general unordered trees where pointers to children are stored in a block of consecutive locations.

### Storage and Retrieval of Highly Repetitive Sequence Collections

• Biology
J. Comput. Biol.
• 2010
New static and dynamic full-text indexes are developed that are able of capturing the fact that a collection is highly repetitive, and require space basically proportional to the length of one typical sequence plus the total number of edit operations.

### Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype

• Biology, Computer Science
Nature Biotechnology
• 2019
This work presents a method named HISAT2 (hierarchical indexing for spliced alignment of transcripts 2) that can align both DNA and RNA sequences using a graph Ferragina Manzini index, and uses it to represent and search an expanded model of the human reference genome.