Khmer Release V2.1: Software for Biological Sequence Analysis

@article{Standage2017KhmerRV,
  title={Khmer Release V2.1: Software for Biological Sequence Analysis},
  author={Daniel S. Standage and Ali yari and Lisa J. Cohen and Michael R. Crusoe and Tim Head and Luiz C. Irber and Shannon E. K. Joslin and N. B. Kingsley and Kevin D. Murray and Russell Y. Neches and Camille Scott and Ryan C. Shean and Sascha Steinbiss and Cait Sydney and C. Titus Brown},
  journal={J. Open Source Softw.},
  year={2017},
  volume={2},
  pages={272}
}
1Lab for Data Intensive Biology; School of Veterinary Medicine; University of California, Davis 2Integrative Genetics and Genomics Graduate Group; University of California, Davis 3Molecular, Cellular, and Integrative Physiology Graduate Group; University of California, Davis 4Common Workflow Language Project 5Wild Tree Tech 6Computer Science Graduate Group; University of California, Davis 7ARC Centre of Excellence in Plant Energy Biology; Australian National University 8Microbiology Graduate… 
4 Citations
Exploring neighborhoods in large metagenome assembly graphs reveals hidden sequence diversity
TLDR
An information retrieval system for large metagenomic data sets that exploits the sparsity of DNA assembly graphs to efficiently extract subgraphs surrounding an inferred genome is introduced.
Exploring neighborhoods in large metagenome assembly graphs using spacegraphcats reveals hidden sequence diversity
TLDR
An information retrieval system for large metagenomic data sets that exploits the sparsity of DNA assembly graphs to efficiently extract subgraphs surrounding an inferred genome is introduced.
Kevlar: a mapping-free framework for accurate discovery of de novo variants
TLDR
A mapping-free method, Kevlar, for de novo variant discovery based on direct comparison of sequence content between related individuals is developed, which utilizes a novel probabilistic approach to score and rank the variant predictions to identify the most likely de noovo variants.

References

SHOWING 1-6 OF 6 REFERENCES
A Reference-Free Algorithm for Computational Normalization of Shotgun Sequencing Data
TLDR
Digital normalization is described, a single-pass computational algorithm that systematizes coverage in shotgun sequencing data sets, thereby decreasing sampling variation, discarding redundant data, and removing the majority of errors.
The khmer software package: enabling efficient nucleotide sequence analysis
The khmer package is a freely available software library for working efficiently with fixed length DNA words, or k-mers. khmer provides implementations of a probabilistic k-mer counting data
Scaling metagenome sequence assembly with probabilistic de Bruijn graphs
TLDR
A memory-efficient graph representation based on a probabilistic data structure, a Bloom filter, that allows us to efficiently store assembly graphs in as little as 4 bits per k-mer, albeit inexactly, is introduced, which reduces the overall memory requirements for de novo assembly of metagenomes.
Crossing the streams: a framework for streaming analysis of short DNA sequencing reads
We present a semi-streaming algorithm for k-mer spectral analysis of DNA sequencing reads, together with a derivative approach that is fully streaming. The approach can also be applied to genomic,
These Are Not the K-mers You Are Looking For: Efficient Online K-mer Counting Using a Probabilistic Data Structure
TLDR
The speed, the memory usage, and the miscount rate of khmer for generating k-mer frequency distributions and retrieving k-MER counts for individual k-mers are analyzed.
khmer release v2.1: software for biological sequence analysis
  • Journal of Open Source Software,
  • 2017