Linear Time Construction of Indexable Elastic Founder Graphs

@article{Rizzo2022LinearTC,
  title={Linear Time Construction of Indexable Elastic Founder Graphs},
  author={Nicola Rizzo and Veli M{\"a}kinen},
  journal={ArXiv},
  year={2022},
  volume={abs/2201.06492}
}
. Pattern matching on graphs has been widely studied lately due to its importance in genomics applications. Unfortunately, even the simplest problem of deciding if a string appears as a subpath of a graph admits a quadratic lower bound under the Orthogonal Vectors Hypothesis (Equi et al. ICALP 2019, SOFSEM 2021). To avoid this bottleneck, the research has shifted towards more specific graph classes, e.g. those induced from multiple sequence alignments ( MSA s). Consider segment-ing MSA [1 ..m, 1… 
2 Citations

Indexable Elastic Founder Graphs of Minimum Height

The indexable EFG minimizing the maximum prefix-aware height provides a lower bound for the original height: by exploiting exploiting suffix trees built from the MSA rows and the data structure answering weighted ancestor queries in constant time of Belazzougui et al.

Algorithms and Complexity on Indexing Founder Graphs

A compact pangenome representation based on an optimal segmentation concept that aims to reconstruct founder sequences from a multiple sequence alignment (MSA) is introduced and it is shown that unless the Strong Exponential Time Hypothesis fails, one cannot build an index on an elastic founder graph in polynomial time to support fast queries.

References

SHOWING 1-10 OF 18 REFERENCES

Linear Time Construction of Indexable Founder Block Graphs

A compact pangenome representation based on an optimal segmentation concept that aims to reconstruct founder sequences from a multiple sequence alignment (MSA) and derives a succinct index structure to support queries of arbitrary length in the paths of the graph.

On the Complexity of String Matching for Graphs

A conditional lower bound is proved stating that, for any constant > 0, an O(|E|1− m)-time algorithm for exact string matching in graphs, with node labels and patterns drawn from a binary alphabet, cannot be achieved unless the Strong Exponential Time Hypothesis (SETH) is false.

Even Faster Elastic-Degenerate String Matching via Fast Matrix Multiplication

By designing an appropriate reduction, it is shown that a combinatorial algorithm solving the EDSM problem in $\mathcal{O}(nm^{1.5-\epsilon} + N)$ time, for any $\ep silon>0$, refutes this conjecture.

Indexing Graphs for Path Queries with Applications in Genome Research

The Burrows-Wheeler transform of strings to acyclic directed labeled graphs is extended to support path queries as an extension to substring searching, and several applications of such extensions are studied.

Algorithms and Complexity on Indexing Elastic Founder Graphs

It is proved that even induced graphs induced from multiple sequence alignments are hard to index under OVH, and two subclasses that are easy to index are introduced: Elastic degenerate strings and elastic founder graphs.

Linear time minimum segmentation enables scalable founder reconstruction

An O(mn) time (i.e. linear time in the input size) algorithm is given to solve the minimum segmentation problem for founder reconstruction, improving over an earlier $$O(mn^2)$$O (mn2).

GraphTyper2 enables population-scale genotyping of structural variation using pangenome graphs

GraphTyper2, which uses pangenome graphs to genotype structural variants using short-reads and can be applied in large-scale sequencing studies, is presented.

Succinct static data structures

This thesis investigates the problem of data optimization for some fundamental static data types, concentrating on linked data structures such as trees, and problems of finding a minimal representation for general unordered trees where pointers to children are stored in a block of consecutive locations.

Storage and Retrieval of Highly Repetitive Sequence Collections

New static and dynamic full-text indexes are developed that are able of capturing the fact that a collection is highly repetitive, and require space basically proportional to the length of one typical sequence plus the total number of edit operations.

Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype

This work presents a method named HISAT2 (hierarchical indexing for spliced alignment of transcripts 2) that can align both DNA and RNA sequences using a graph Ferragina Manzini index, and uses it to represent and search an expanded model of the human reference genome.