CDD: a Conserved Domain Database for protein classification

  title={CDD: a Conserved Domain Database for protein classification},
  author={Aron Marchler-Bauer and John B. Anderson and Praveen F. Cherukuri and Carol DeWeese-Scott and Lewis Y. Geer and Marc Gwadz and Siqian He and David I. Hurwitz and John D. Jackson and Zhaoxi Ke and Christopher J. Lanczycki and Cynthia A. Liebert and Chunlei Liu and Fu Lu and Gabriele H. Marchler and Mikhail Mullokandov and Benjamin A. Shoemaker and Vahan Simonyan and James S. Song and Paul A. Thiessen and Roxanne A. Yamashita and Jodie J. Yin and Dachuan Zhang and Stephen H. Bryant},
  journal={Nucleic Acids Research},
  pages={D192 - D196}
The Conserved Domain Database (CDD) is the protein classification component of NCBI's Entrez query and retrieval system. CDD is linked to other Entrez databases such as Proteins, Taxonomy and PubMed®, and can be accessed at CD-Search, which is available at, is a fast, interactive tool to identify conserved domains in new protein sequences. CD-Search results for protein sequences in Entrez… 

Figures from this paper

CDD: specific functional annotation with the Conserved Domain Database
NCBI's Conserved Domain Database is a collection of multiple sequence alignments and derived database search models, which represent protein domains conserved in molecular evolution, and provides annotation of domain footprints and conserved functional sites on protein sequences.
CDD: a conserved domain database for interactive domain family analysis
A novel helper application, CDTree, is presented, which enables users of the CDD resource to examine curated hierarchies and serve as a powerful tool in protein classification, as they allow users to analyze protein sequences in the context of domain family hierarchies.
Annotation of functional sites with the Conserved Domain Database
It is observed that CDD-based site annotation complements existing site annotation in many cases, which may, in part, originate from CDD's curation practice of collecting sites conserved across diverse taxa and supported by evidence from multiple 3D structures.
Database resources of the National Center for Biotechnology Information
In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and
TarO: a target optimisation system for structural biology
TarO offers a single point of reference for key bioinformatics analyses relevant to selecting proteins or domains for study by structural biology techniques and obtains predictions of properties for these sequences including crystallisation propensity, protein disorder and post-translational modifications.
Improving protein structure similarity searches using domain boundaries based on conserved sequence information
Alternative domains, which have significantly different secondary structure composition from those based on structurally compact units, were identified based on the alignment footprints of curated protein sequence domain families and are in the process of inclusion into the VAST search and MMDB resources in the NCBI Entrez system.
Protein subfamily assignment using the Conserved Domain Database
This work proposes a method for assigning NCBI-curated domains from the Curated Domain Database (CDD) that takes into account the organization of the domains into hierarchies of homologous domain models, and finds that simple heuristics based on sorting scores and domain-specific thresholds are effective at reducing classification error.
LigProf: A simple tool for in silico prediction of ligand-binding sites
It is shown that the LigProf method can already be applied successfully to the highly represented ligand-bound protein kinase domains of viral and human origin.
MMDB: annotating protein sequences with Entrez's 3D-structure database
An annotation service that combines some of these tools automatically, Entrez's ‘Related Structure’ links, that presents 3D views mapping sequence residues onto all 3D structures available in MMDB.
Expanded microbial genome coverage and improved protein family annotation in the COG database
An update of the Clusters of Orthologous Groups of proteins, the first since 2003, and a comprehensive revision of the COG annotations and expansion of the genome coverage to include representative complete genomes from all bacterial and archaeal lineages down to the genus level are presented.


CDD: a database of conserved domain alignments with links to domain three-dimensional structure
The Conserved Domain Database (CDD) is a compilation of multiple sequence alignments representing protein domains conserved in molecular evolution. It has been populated with alignment data from the
CD-Search: protein domain annotations on the fly
We describe the Conserved Domain Search service (CD-Search), a web-based tool for the detection of structural and functional domains in protein sequences. CD-Search uses BLAST(R) heuristics to
CDART: protein homology by domain architecture.
The Conserved Domain Architecture Retrieval Tool (CDART) performs similarity searches of the NCBI Entrez Protein Database based on domain architecture, defined as the sequential order of conserved
SMART 4.0: towards genomic data integration
Improvements in SMART are centred on the integration of data from completed metazoan genomes, and the ability to query SMART by Gene Ontology terms, improved structure database searching and batch retrieval of multiple entries.
Comparison of sequence and structure alignments for protein domains
It is found that domain alignments in publicly available collections based on sequence and structure comparison are largely consistent, however, the homologous regions identified by sequence comparison are often shorter than those identified by 3D structure comparison.
The COG database: an updated version includes eukaryotes
A major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes is described and is expected to be a useful platform for functional annotation of newlysequenced genomes, including those of complex eukARYotes, and genome-wide evolutionary studies.
The last CTD repeat of the mammalian RNA polymerase II large subunit is important for its stability.
It is shown that removal or severe disruption of the last CTD repeat, but not point mutation of its CKII sites, results in its proteolytic degradation to the Pol IIb form in vivo, but does not appear to affect the specific transcription of genes.
Cn3D: sequence and structure views for Entrez.
andNatale,D.A. (2003)TheCOGdatabase:andupdated version includes eukaryotes
  • BMC Bioinformatics,
  • 2003
0: towards genomic data integration
  • Nucleic Acids Res
  • 2004