Learn More
We have constructed a map of the "protein structure space" by using the pairwise structural similarity scores calculated for all nonredundant protein structures determined experimentally. As expected, proteins with similar structures clustered together in the map and the overall distribution of structural classes of this map followed closely that of the map(More)
Ten complete mammalian genome sequences were compared by using the "feature frequency profile" (FFP) method of alignment-free comparison. This comparison technique reveals that the whole nongenic portion of mammalian genomes contains evolutionary information that is similar to their genic counterparts--the intron and exon regions. We partitioned the(More)
Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now(More)
The bacterial microbiota of plants is diverse, with 1000s of operational taxonomic units (OTUs) associated with any individual plant. In this work, we used phenotypic analysis, comparative genomics, and metabolic models to investigate the differences between 19 sequenced Pseudomonas fluorescens strains. These isolates represent a single OTU and were(More)
BACKGROUND More than 80% of the microbial genomes in GenBank are of 'draft' quality (12,553 draft vs. 2,679 finished, as of October, 2013). We have examined all the microbial DNA sequences available for complete, draft, and Sequence Read Archive genomes in GenBank as well as three other major public databases, and assigned quality scores for more than(More)
The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches, including the rhizosphere and endosphere of many plants. Their diversity influences the phylogenetic diversity and heterogeneity of these communities. On the basis of average amino acid identity, comparative genome analysis of(More)
The 2014 Ebola outbreak in West Africa is the largest documented for this virus. To examine the dynamics of this genome, we compare more than 100 currently available ebolavirus genomes to each other and to other viral genomes. Based on oligomer frequency analysis, the family Filoviridae forms a distinct group from all other sequenced viral genomes. All(More)
It has been 30 years since the initial emergence and subsequent rapid global spread of multidrug-resistant Salmonella entericaserovar Typhimurium DT104 (MDR DT104). Nonetheless, its origin and transmission route have never been revealed. We used whole-genome sequencing (WGS) and temporally structured sequence analysis within a Bayesian framework to(More)
For decades there has been increasing interest in understanding the relationships between microbial communities and ecosystem functions. Current DNA sequencing technologies allows for the exploration of microbial communities in two principle ways: targeted rRNA gene surveys and shotgun metagenomics. For large study designs, it is often still prohibitively(More)
  • 1