Learn More
Ribosomal Database Project (RDP; http://rdp.cme.msu.edu/) provides the research community with aligned and annotated rRNA gene sequence data, along with tools to allow researchers to analyze their own rRNA gene sequences in the RDP framework. RDP data and tools are utilized in fields as diverse as human health, microbial ecology, environmental microbiology,(More)
Ribosomal RNA genes have become the standard molecular markers for microbial community analysis for good reasons, including universal occurrence in cellular organisms, availability of large databases, and ease of rRNA gene region amplification and analysis. As markers, however, rRNA genes have some significant limitations. The rRNA genes are often present(More)
Scientists spend an increasing amount of time building and using software. However, most scientists are never taught how to do this efficiently. As a result, many are unaware of tools and practices that would allow them to write more reliable and maintainable code with less effort. We describe a set of best practices for scientific software development that(More)
Between July 18(th) and 24(th) 2010, 26 leading microbial ecology, computation, bioinformatics and statistics researchers came together in Snowbird, Utah (USA) to discuss the challenge of how to best characterize the microbial world using next-generation sequencing technologies. The meeting was entitled "Terabase Metagenomics" and was sponsored by the(More)
The khmer package is a freely available software library for working efficiently with fixed length DNA words, or k-mers. khmer provides implementations of a probabilistic k-mer counting data structure, a compressible De Bruijn graph representation, De Bruijn graph partitioning, and digital normalization. khmer is implemented in C++ and Python, and is freely(More)
Using an artiicial system of self-replicating strings, we show a correlation between the age of a genotype and its abundance that reeects a punctuated rather than gradual picture of evolution, as suggested long ago by Willis. In support of this correlation, we measure genotype abundance distributions and nd universal coee-cients. Finally, we propose a(More)
The genome of the food-borne pathogen Campylobacter jejuni contains multiple highly mutable sites, or contingency loci. It has been suggested that standing variation at these loci is a mechanism for rapid adaptation to a novel environment, but this phenomenon has not been shown experimentally. In previous work we showed that the virulence of C. jejuni(More)
K-mer abundance analysis is widely used for many purposes in nucleotide sequence analysis, including data preprocessing for de novo assembly, repeat detection, and sequencing coverage estimation. We present the khmer software package for fast and memory efficient online counting of k-mers in sequencing data sets. Unlike previous methods based on data(More)