Efficient cardinality estimation for k-mers in large DNA sequencing data sets

Abstract

We present an open implementation of the HyperLogLog cardinality estimation sketch for counting fixed-length substrings of DNA strings (“k-mers”). The HyperLogLog sketch implementation is in C++ with a Python interface, and is distributed as part of the khmer software package. khmer is freely available from https://github.com/dib-lab/khmer under a BSD License. The features presented here are included in version 1.4 and later. Page 1 of 5 . CC-BY 4.0 International license not peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was . http://dx.doi.org/10.1101/056846 doi: bioRxiv preprint first posted online Jun. 7, 2016;

2 Figures and Tables

Cite this paper

@inproceedings{Irber2016EfficientCE, title={Efficient cardinality estimation for k-mers in large DNA sequencing data sets}, author={Luiz C. Irber and Titus Brown}, year={2016} }