Hybrid Indexes for Repetitive Datasets

  title={Hybrid Indexes for Repetitive Datasets},
  author={Hector Ferrada and Travis Gagie and Tommi Hirvola and Simon J. Puglisi},
  journal={Philosophical transactions. Series A, Mathematical, physical, and engineering sciences},
  volume={372 2016},
Advances in DNA sequencing mean that databases of thousands of human genomes will soon be commonplace. In this paper, we introduce a simple technique for reducing the size of conventional indexes on such highly repetitive texts. Given upper bounds on pattern lengths and edit distances, we pre-process the text with the lossless data compression algorithm LZ77 to obtain a filtered text, for which we store a conventional index. Later, given a query, we find all matches in the filtered text, then… CONTINUE READING
Highly Cited
This paper has 27 citations. REVIEW CITATIONS
Related Discussions
This paper has been referenced on Twitter 1 time. VIEW TWEETS

From This Paper

Figures, tables, and topics from this paper.

Similar Papers

Loading similar papers…