Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy

@article{Wang2007NaiveBC,
  title={Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy},
  author={Qiong Wang and George M. Garrity and James M. Tiedje and James R. Cole},
  journal={Applied and Environmental Microbiology},
  year={2007},
  volume={73},
  pages={5261 - 5267}
}
ABSTRACT The Ribosomal Database Project (RDP) Classifier, a naïve Bayesian classifier, can rapidly and accurately classify bacterial 16S rRNA sequences into the new higher-order taxonomy proposed in Bergey's Taxonomic Outline of the Prokaryotes (2nd ed., release 5.0, Springer-Verlag, New York, NY, 2004). It provides taxonomic assignments from domain to genus, with confidence estimates for each assignment. The majority of classifications (98%) were of high estimated confidence (≥95%) and high… 
Naïve Bayesian Classifiers with Multinomial Models for rRNA Taxonomic Assignment
TLDR
This study presents the naïve Bayesian classifiers with multinomial models that take repetitive 8-mers into account for classifying microbial 16S and fungal 28S rRNA sequences and demonstrates that the multin coefficients approach can generally achieve a higher prediction accuracy in most hypervariable regions.
Accurate, Rapid Taxonomic Classification of Fungal Large-Subunit rRNA Genes
TLDR
The naïve Bayesian classifier provides an effective and rapid means to classify fungal LSU sequences from large environmental surveys and provides equal or superior classification accuracy.
Taxonomy annotation errors in 16S rRNA and fungal ITS sequence databases
TLDR
The error rates of taxonomy annotations in specialized ribosomal sequence databases including Greengenes, SILVA and RDP are investigated to estimate the error error rate.
From Genus to Phylum: Large-Subunit and Internal Transcribed Spacer rRNA Operon Regions Show Similar Classification Accuracies Influenced by Database Composition
TLDR
The results show that any of the ITS or LSU sections tested provided comparable classification accuracy to the genus level and underscore the need for larger and more diverse classification training sets.
Using the RDP Classifier to Predict Taxonomic Novelty and Reduce the Search Space for Finding Novel Organisms
TLDR
It is concluded that selecting a read-length appropriate RDP bootstrap score can significantly reduce the search space for identifying novel genera and higher levels in taxonomy and the detector is a good predictor to determine novel abundant taxa.
Taxonomy annotation and guide tree errors in 16S rRNA databases
TLDR
The branching orders of the Greengenes and SILVA guide trees were found to disagree at comparable rates with each other and with taxonomy annotations according to the training set provided by RDP, indicating that the trees have comparable quality.
A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy
TLDR
This work has developed a method that shows significantly improved species-level classification results over existing methods and provides probabilistic-based confidence scores to evaluate the reliability of the taxonomic classification assignments based on multiple database matches to query sequences.
TUIT, a BLAST-based tool for taxonomic classification of nucleotide sequences.
TLDR
TUIT (Taxonomic Unit Identification Tool) is introduced-an efficient open source and platform-independent application that can perform taxonomic classification on its own or can be used in combination with the RDP II Classifier to maximize the taxonomic identification rate.
Accuracy of taxonomy prediction for 16S rRNA and fungal ITS sequences
TLDR
Assessment of the accuracy of several algorithms using cross-validation by identity, a new benchmark strategy which explicitly models the variation in distances between query sequences and the closest entry in a reference database, found 95% identity was found to be a twilight zone where taxonomy is highly ambiguous.
Construction & assessment of a unified curated reference database for improving the taxonomic classification of bacteria using 16S rRNA sequence data
TLDR
The results indicate that for analysis of bacterial mixtures, sequencing of V2-V3 region of 16S rRNA followed by analysis of the data using the mothur-nbc classifier and the 16S-UDb database may be preferred.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 32 REFERENCES
Capturing whole-genome characteristics in short sequences using a naïve Bayesian classifier.
TLDR
A naïve Bayesian classifier was developed and sequences as short as 400 bases could be correctly classified with an accuracy of 85% and it was found that this classification methodology could be a valuable tool in biodiversity studies.
Comprehensive aligned sequence construction for automated design of effective probes (CASCADE-P) using 16S rDNA
TLDR
The main focus of creating the prokMSA was to provide a comprehensive, categorized, updateable 16S rDNA collection useful as a foundation for any probe selection algorithm.
The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis
The Ribosomal Database Project (RDP-II) provides the research community with aligned and annotated rRNA gene sequences, along with analysis services and a phylogenetically consistent taxonomic
UniFrac: a New Phylogenetic Method for Comparing Microbial Communities
TLDR
The results illustrate that UniFrac provides a new way of characterizing microbial communities, using the wealth of environmental rRNA sequences, and allows quantitative insight into the factors that underlie the distribution of lineages among environments.
Phylogenetic heterogeneity of the genus Bacillus revealed by comparative analysis of small‐subunit‐ribosomal RNA sequences
TLDR
It is evident that the genus Bacillus in genetically extremely heterogeneous and requires extensive taxonomic revision and the rRNA structures defined in the present study will provide a firm basis for the division of Bacillus into several phylogenetically distinct genera.
The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs
TLDR
This online RNA sequence and structure information, the result of extensive analysis, interpretation, data collection, and computer program and web development, is accessible at the Comparative RNA Web (CRW) Site.
Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya.
TLDR
It is proposed that a formal system of organisms be established in which above the level of kingdom there exists a new taxon called a "domain." Life on this planet would be seen as comprising three domains, the Bacteria, the Archaea, and the Eucarya, each containing two or more kingdoms.
Phylogenetic Approaches for Describing and Comparing the Diversity of Microbial Communities
  • Andrew P. Martin
  • Environmental Science
    Applied and Environmental Microbiology
  • 2002
TLDR
It is shown that information gained from analysis of DNA sequences provides the basis for statistical analysis of communities in ways that advance inferences about the processes that may govern the compositions and functions of microbial communities.
Weighted neighbor joining: a likelihood-based approach to distance-based phylogeny reconstruction.
TLDR
Weighbor appears to be relatively immune to the "long branches attract" and "long branch distracts" drawbacks observed with neighbor joining, BIONJ, and parsimony, and is much faster, while building trees that are qualitatively and quantitatively similar.
Compilation of small ribosomal subunit RNA structures
The database on small ribosomal subunit RNA structure contained 1804 nucleotide sequences on April 23, 1993. This number comprises 365 eukaryotic, 65 archaeal, 1260 bacterial, 30 plastidial, and 84
...
1
2
3
4
...