A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea

  title={A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea},
  author={Dongying Wu and P. Bernt Hugenholtz and Konstantinos Mavromatis and R{\"u}diger Pukall and Eileen Dalin and Natalia N. Ivanova and Victor Kunin and Lynne A. Goodwin and Martin Wu and Brian J. Tindall and Sean D. Hooper and Amrita Pati and Athanasios Lykidis and Stefan Spring and Iain Anderson and Patrik D’haeseleer and Adam T. Zemla and Mitchell Singer and Alla L. Lapidus and Matt Nolan and Alex Copeland and Cliff Han and Feng Chen and Jan-Fang Cheng and Susan M. Lucas and Cheryl A. Kerfeld and Elke Lang and Sabine Gronow and Patrick S. G. Chain and David G. Bruce and Edward M. Rubin and Nikos C. Kyrpides and Hans-Peter Klenk and Jonathan A. Eisen},
Sequencing of bacterial and archaeal genomes has revolutionized our understanding of the many roles played by microorganisms. There are now nearly 1,000 completed bacterial and archaeal genomes available, most of which were chosen for sequencing on the basis of their physiology. As a result, the perspective provided by the currently available genomes is limited by a highly biased phylogenetic distribution. To explore the value added by choosing microbial genomes for sequencing on the basis of… 
Expansion of the Genomic Encyclopedia of Bacteria and Archaea
Generating these reference genomes of uncultured microbes will dramatically increase the discovery rate of novel protein families and biological functions, shed light on the numerous underrepresented phyla that likely play important roles in the environment, and will assist in improving the reconstruction of the evolutionary history of Bacteria and Archaea.
Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees and Supermatrices
It is concluded that the current best approach for generating a single phylogenetic tree, suitable for use as a reference phylogeny for comparative analyses, is to perform a maximum likelihood analysis of a concatenated alignment of conserved, single-copy genes.
1,003 reference genomes of bacterial and archaeal isolates expand coverage of the tree of life
We present 1,003 reference genomes that were sequenced as part of the Genomic Encyclopedia of Bacteria and Archaea (GEBA) initiative, selected to maximize sequence coverage of phylogenetic space.
Estimate of the sequenced proportion of the global prokaryotic genome
The current situation of prokaryotic genome sequencing for earth biomes is revealed, and large-scale alignment between sequences released by the Earth Microbiome Project and public databases provides a more reasonable and efficient exploration of proKaryotic genomes, and promotes the understanding of microbial ecological functions.
Insights into the phylogeny and coding potential of microbial dark matter
This study applies single-cell genomics to target and sequence 201 archaeal and bacterial cells from nine diverse habitats belonging to 29 major mostly uncharted branches of the tree of life and provides a systematic step towards a better understanding of biological evolution on the authors' planet.
Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life
The recovery of 7,903 bacterial and archaeal metagenome-assembled genomes increases the phylogenetic diversity represented by public genome repositories and provides the first representatives from 20 candidate phyla.
A genomic catalog of Earth’s microbiomes
The utility of this collection of >10,000 metagenomes collected from diverse habitats covering all of Earth’s continents and oceans is demonstrated for understanding secondary-metabolite biosynthetic potential and for resolving thousands of new host linkages to uncultivated viruses.
Integrating genomics into the taxonomy and systematics of the Bacteria and Archaea.
  • J. Chun, F. Rainey
  • Biology
    International journal of systematic and evolutionary microbiology
  • 2014
This special issue of International Journal of Systematic and Evolutionary Microbiology contains both original research and review articles covering the use of genomic sequence data in microbial taxonomy and systematics, and outlines of approaches for incorporating genomics into new strain isolation to new taxon description workflows.
Proteogenomics of rare taxonomic phyla: A prospective treasure trove of protein coding genes
All bacterial and archaeal proteogenomic studies carried out to date are collated and reviewed and an urge for targeted proteogenomics of underexplored taxa to build an extensive reference of protein coding genes is urged.


A molecular view of microbial diversity and the biosphere.
Over three decades of molecular-phylogenetic studies, researchers have compiled an increasingly robust map of evolutionary diversification showing that the main diversity of life is microbial,
Complete Genome Sequence of the Aerobic CO-Oxidizing Thermophile Thermomicrobium roseum
It is proposed that glycosylation of its carotenoids plays a crucial role in the adaptation of the cell membrane to this bacterium's thermophilic lifestyle and suggests a straightforward means for lateral transfer of flagellum-based motility.
The impact of reticulate evolution on genome phylogeny.
Using replicated simulations of genome evolution, it is shown that different scenarios of lateral genetic transfer have significant impacts on the ability to recover the "true" tree of genomes, even when corrections for phylogenetically discordant signals are used.
Exploring prokaryotic diversity in the genomic era
Our understanding of prokaryote biology from study of pure cultures and genome sequencing has been limited by a pronounced sampling bias towards four bacterial phyla - Proteobacteria, Firmicutes,
Microbial diversity and the genetic nature of microbial species
It is proposed that decisions on the existence of species and methods to define them should be guided by a method-free species concept that is based on cohesive evolutionary forces.
Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB
There is incongruent taxonomic nomenclature among curators even at the phylum level, and environmental sequences were classified into 100 phylum-level lineages in the Archaea and Bacteria.
Protein interaction maps for complete genomes based on gene fusion events
It is shown that 215 genes or proteins in the complete genomes of Escherichia coli, Haemophilus influenzae and Methanococcus jannaschii are involved in 64 unique fusion events, which is able to predict functional associations of proteins.
A Bioinformatician's Guide to Metagenomics
The chain of decisions accompanying a metagenomic project from the viewpoint of the bioinformatic analysis step by step is described, with recommendations for sampling and data generation including sample and metadata collection, community profiling, construction of shotgun libraries, and sequencing strategies.