Database Resources of the National Center for Biotechnology Information

  title={Database Resources of the National Center for Biotechnology Information},
  author={Denis Vakatov and Eugene Yaschenko},
  journal={Nucleic Acids Research},
  pages={D12 - D17}
The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts for published life science journals. The Entrez system provides search and retrieval operations for most of these data from 37 distinct databases. The E-utilities serve as the programming interface for the Entrez system. Augmenting many of the Web… 

Tables from this paper

AnnotationBustR: an R package to extract subsequences from GenBank annotations
AnnotationBustR allows users to extract sequences based on GenBank annotations through the ACNUC retrieval system given search terms of gene synonyms and accession numbers and writes them to a FASTA file for users to employ in their research endeavors.
VarStack: a web tool for data retrieval to interpret somatic variants in cancer
VarStack saves time by providing variant data to the user from multiple databases in an easy-to-export and interpretable format and has the batch search and data download options, which users can easily incorporate into their workflow or tools.
cblaster: a remote search tool for rapid identification and visualisation of homologous gene clusters
Genes involved in coordinated biological pathways, including metabolism, drug resistance and virulence, are often collocalised as gene clusters. Identifying homologous gene clusters aids in the study
MetaPhinder—Identifying Bacteriophage Sequences in Metagenomic Data Sets
MetaPhinder is presented, a method to identify assembled genomic fragments (i.e.contigs) of phage origin in metagenomic data sets based on a comparison to a database of whole genome bacteriophage sequences, integrating hits to multiple genomes to accomodate for the mosaic genome structure of many bacter iophages.
RESCRIPt: Reproducible sequence taxonomy reference database management for the masses
RESCRIPt provides tools to democratize the process of reference database acquisition and management, enabling researchers to reproducibly and transparently create reference materials for diverse research applications.
How to use the MEROPS database and website to help understand peptidase specificity
Recommendations are made about how best to analyze data and analyses to indicate peptidase binding site preferences and exclusions and identify peptidases where co‐operative binding occurs between adjacent binding sites.
DNA microarray technology and bioinformatic web services.
This article has focused its look onto a variety of online/offline databases, software, and tools in the case of microarray probe designing.
PSORTdb 4.0: expanded and redesigned bacterial and archaeal protein subcellular localization database incorporating new secondary localizations
This expanded PSORTdb database will be of wide use to researchers developing SCL predictors or studying diverse microbes, including medically, agriculturally and industrially important species that have both classic or atypical cell envelope structures or vesicles.
Chickspress: a resource for chicken gene expression
Chickspress is reported, the first publicly available gene expression resource for chicken tissues, which incorporates both NCBI and Ensembl gene models and links these gene sets with experimental gene expression data and QTL information, and can be compared to each of these prediction workflows for these products.


Database resources of the National Center for Biotechnology Information
  • Richa Tanya Jeff Dennis A Colleen Evan Devon J Rodney St Agarwala Barrett Beck Benson Bollin Bolton Bourexi, R. Agarwala, Kerry Zbicz
  • Computer Science
    Nucleic Acids Res.
  • 2018
Abstract The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database
GenBank® is a comprehensive database that contains publicly available nucleotide sequences for over 340 000 formally described species and integrates these records with a variety of other data including taxonomy nodes, genomes, protein structures, and biomedical journal literature in PubMed.
The National Center for Biotechnology Information's Protein Clusters Database
The NCBI Protein Clusters Database provides an efficient method to aggregate gene and protein annotation for researchers and is available at
Education resources of the National Center for Biotechnology Information
The National Center for Biotechnology Information (NCBI) hosts 39 literature and molecular biology databases containing almost half a billion records and provides teaching materials such as tutorials, problem sets and educational tools such as the Amino Acid Explorer, PSSM Viewer and Ebot.
dbSNP: the NCBI database of genetic variation
The dbSNP database is a general catalog of genome variation to address the large-scale sampling designs required by association studies, gene mapping and evolutionary biology, and is integrated with other sources of information at NCBI such as GenBank, PubMed, LocusLink and the Human Genome Project data.
Entrez Gene: gene-centered information at NCBI
Entrez Gene is a step forward from NCBI's LocusLink, with both a major increase in taxonomic scope and improved access through the many tools associated with NCBI Entrez.
NCBI Reference Sequences: current status, policy and new initiatives
The recent growth of the RefSeq database, recent changes to feature annotations and record types for eukaryotic (primarily vertebrate) species and policies regarding species inclusion and genome annotation are reported on.
Complete genomes in WWW Entrez: data representation and analysis
Flexible web based views, precomputed relationships, and immediate access to analytical tools provide scientists with a portal into the new insights to be gained from completed genome sequences.
Entrez: molecular biology database and retrieval system.
The Protein Information Resource
The Protein Information Resource is an integrated public resource of protein informatics that supports genomic and proteomic research and scientific discovery and has developed a bibliography system for literature searching, mapping, and user submission.