A complete domain-to-species taxonomy for Bacteria and Archaea

@article{Parks2020ACD,
  title={A complete domain-to-species taxonomy for Bacteria and Archaea},
  author={Donovan H. Parks and Maria Chuvochina and Pierre-Alain Chaumeil and Christian Rinke and Aaron J. Mussig and P. Bernt Hugenholtz},
  journal={Nature Biotechnology},
  year={2020},
  pages={1 - 8}
}
The Genome Taxonomy Database is a phylogenetically consistent, genome-based taxonomy that provides rank-normalized classifications for ~150,000 bacterial and archaeal genomes from domain to genus. However, almost 40% of the genomes in the Genome Taxonomy Database lack a species name. We address this limitation by using commonly accepted average nucleotide identity criteria to set bounds on species and propose species clusters that encompass all publicly available bacterial and archaeal genomes… 

A standardized archaeal taxonomy for the Genome Taxonomy Database.

A standardized archaeal taxonomy is proposed that is derived from a 122-concatenated-protein phylogeny that resolves polyphyletic groups and normalizes ranks based on relative evolutionary divergence and is shown to robustly correct for substitution rates varying up to 30-fold using simulated datasets.

Resolving widespread incomplete and uneven archaeal classifications based on a rank-normalized genome-based taxonomy

A standardized archaeal taxonomy is proposed, as part of the Genome Taxonomy Database (GTDB), derived from a 122 concatenated protein phylogeny that resolves polyphyletic groups and normalizes ranks based on relative evolutionary divergence (RED).

Naming the unnamed: over 65,000 Candidatus names for unnamed Archaea and Bacteria in the Genome Taxonomy Database.

This work exploits an approach to the generation of well-formed arbitrary Latinate names at a scale sufficient to name tens of thousands of unnamed taxa within GTDB.

GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy

Prokaryotic diversity from the perspective of the GTDB is explored and the importance of metagenome-assembled genomes in expanding available genomic representation is highlighted and the use of average nucleotide identities as a pragmatic approach for delineating proKaryotic species is discussed.

It is time for a new type of type to facilitate naming the microbial world

Diversity, function and evolution of marine microbe genomes

The database provides a comprehensive resource for marine microbiome, which would be a valuable reference for studies of marine life origination and evolution, ecology monitor and protection, bioactive compound development.

Functional and evolutionary significance of unknown genes from uncultivated taxa

A global multi-habitat dataset is analyzed and 980 previously neglected protein families that can accurately distinguish entire uncultivated phyla, classes, and orders are found, likely representing synapomorphic traits that fostered their divergence.

Microbial Phylogenetic Context Using Phylogenetic Outlines

A new interactive graphical tool is provided that addresses the phylogenetic context of a draft genome using Mash sketches to compare against all bacterial and archaeal representative genomes in the Genome Taxonomy Database taxonomy, all within the framework of SplitsTree5.
...

References

SHOWING 1-10 OF 63 REFERENCES

A Genus Definition for Bacteria and Archaea Based on a Standard Genome Relatedness Index

Genetic coherence is an emergent property of genera in Bacteria and Archaea that relies on the combined use of the average nucleotide identity, genome alignment fraction, and the distinction between type- and non-type species in this study.

A genus definition for Bacteria and Archaea based on genome relatedness and taxonomic affiliation

Results show that a distinct difference between distant relatives and close relatives at the genome level (i.e., genomic coherence) is an emergent property of genera in Bacteria and Archaea.

Towards a Genome-Based Taxonomy for Prokaryotes

The AAI-based approach provides a means to evaluate the robustness of alternative genetic markers for phylogenetic purposes, and could contribute significantly to a genome-based taxonomy for all microbial organisms.

Microbial species delineation using whole genome sequences

This work demonstrates that the combination of gANI and the alignment fraction between two genomes accurately reflects their genomic relatedness, and proposes this precise and objective AF,gANI-based species definition: the MiSI (Microbial Species Identifier) method, to be used to address previous inconsistencies in species classification.

Genomic insights that advance the species definition for prokaryotes.

The average nucleotide identity of the shared genes between two strains was found to be a robust means to compare genetic relatedness among strains, and that ANI values of approximately 94% corresponded to the traditional 70% DNA-DNA reassociation standard of the current species definition.

High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries

FastANI is developed, a method to compute ANI using alignment-free approximate sequence mapping, and it is shown 95% ANI is an accurate threshold for demarcating prokaryotic species by analyzing about 90,000 proKaryotic genomes.

1,003 reference genomes of bacterial and archaeal isolates expand coverage of the tree of life

We present 1,003 reference genomes that were sequenced as part of the Genomic Encyclopedia of Bacteria and Archaea (GEBA) initiative, selected to maximize sequence coverage of phylogenetic space.

Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea

A reference phylogeny of 10,575 evenly-sampled bacterial and archaeal genomes, based on a comprehensive set of 381 markers, is built, providing an updated view of domain-level relationships between Archaea and Bacteria.

The Microbial Genomes Atlas (MiGA) webserver: taxonomic and gene diversity analysis of Archaea and Bacteria at the whole genome level

The Microbial Genomes Atlas (MiGA) is a webserver that allows the classification of an unknown query genomic sequence, complete or partial, against all taxonomically classified taxa with available genome sequences, as well as comparisons to other related genomes including uncultivated ones, based on the genome-aggregate Average Nucleotide and Amino Acid Identity concepts.

The bacterial species definition in the genomic era

The analysis of five important bacterial groups suggests, however, that more stringent standards for species may be justifiable when a solid understanding of gene content and ecological distinctiveness becomes available and the idea of biologically meaningful clusters of diversity may not be universally applicable in the microbial world.
...