# Linear Time Construction of Indexable Elastic Founder Graphs

@article{Rizzo2022LinearTC, title={Linear Time Construction of Indexable Elastic Founder Graphs}, author={Nicola Rizzo and Veli M{\"a}kinen}, journal={ArXiv}, year={2022}, volume={abs/2201.06492} }

. Pattern matching on graphs has been widely studied lately due to its importance in genomics applications. Unfortunately, even the simplest problem of deciding if a string appears as a subpath of a graph admits a quadratic lower bound under the Orthogonal Vectors Hypothesis (Equi et al. ICALP 2019, SOFSEM 2021). To avoid this bottleneck, the research has shifted towards more speciﬁc graph classes, e.g. those induced from multiple sequence alignments ( MSA s). Consider segment-ing MSA [1 ..m, 1…

## 2 Citations

### Indexable Elastic Founder Graphs of Minimum Height

- Computer ScienceCPM
- 2022

The indexable EFG minimizing the maximum prefix-aware height provides a lower bound for the original height: by exploiting exploiting suffix trees built from the MSA rows and the data structure answering weighted ancestor queries in constant time of Belazzougui et al.

### Algorithms and Complexity on Indexing Founder Graphs

- MathematicsAlgorithmica
- 2022

A compact pangenome representation based on an optimal segmentation concept that aims to reconstruct founder sequences from a multiple sequence alignment (MSA) is introduced and it is shown that unless the Strong Exponential Time Hypothesis fails, one cannot build an index on an elastic founder graph in polynomial time to support fast queries.

## References

SHOWING 1-10 OF 18 REFERENCES

### Linear Time Construction of Indexable Founder Block Graphs

- Computer Science, MathematicsWABI
- 2020

A compact pangenome representation based on an optimal segmentation concept that aims to reconstruct founder sequences from a multiple sequence alignment (MSA) and derives a succinct index structure to support queries of arbitrary length in the paths of the graph.

### On the Complexity of String Matching for Graphs

- Computer Science, MathematicsICALP
- 2019

A conditional lower bound is proved stating that, for any constant > 0, an O(|E|1− m)-time algorithm for exact string matching in graphs, with node labels and patterns drawn from a binary alphabet, cannot be achieved unless the Strong Exponential Time Hypothesis (SETH) is false.

### Even Faster Elastic-Degenerate String Matching via Fast Matrix Multiplication

- Computer ScienceICALP
- 2019

By designing an appropriate reduction, it is shown that a combinatorial algorithm solving the EDSM problem in $\mathcal{O}(nm^{1.5-\epsilon} + N)$ time, for any $\ep silon>0$, refutes this conjecture.

### Indexing Graphs for Path Queries with Applications in Genome Research

- Computer ScienceIEEE/ACM Transactions on Computational Biology and Bioinformatics
- 2014

The Burrows-Wheeler transform of strings to acyclic directed labeled graphs is extended to support path queries as an extension to substring searching, and several applications of such extensions are studied.

### Algorithms and Complexity on Indexing Elastic Founder Graphs

- MathematicsISAAC
- 2021

It is proved that even induced graphs induced from multiple sequence alignments are hard to index under OVH, and two subclasses that are easy to index are introduced: Elastic degenerate strings and elastic founder graphs.

### Linear time minimum segmentation enables scalable founder reconstruction

- Computer ScienceAlgorithms for Molecular Biology
- 2019

An O(mn) time (i.e. linear time in the input size) algorithm is given to solve the minimum segmentation problem for founder reconstruction, improving over an earlier $$O(mn^2)$$O (mn2).

### GraphTyper2 enables population-scale genotyping of structural variation using pangenome graphs

- BiologyNature Communications
- 2019

GraphTyper2, which uses pangenome graphs to genotype structural variants using short-reads and can be applied in large-scale sequencing studies, is presented.

### Succinct static data structures

- Computer Science
- 1988

This thesis investigates the problem of data optimization for some fundamental static data types, concentrating on linked data structures such as trees, and problems of finding a minimal representation for general unordered trees where pointers to children are stored in a block of consecutive locations.

### Storage and Retrieval of Highly Repetitive Sequence Collections

- BiologyJ. Comput. Biol.
- 2010

New static and dynamic full-text indexes are developed that are able of capturing the fact that a collection is highly repetitive, and require space basically proportional to the length of one typical sequence plus the total number of edit operations.

### Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype

- Biology, Computer ScienceNature Biotechnology
- 2019

This work presents a method named HISAT2 (hierarchical indexing for spliced alignment of transcripts 2) that can align both DNA and RNA sequences using a graph Ferragina Manzini index, and uses it to represent and search an expanded model of the human reference genome.