• Corpus ID: 52856658

Database resources of the National Center for Biotechnology Information

  title={Database resources of the National Center for Biotechnology Information},
  author={David L. Wheeler and Tanya Barrett and Dennis A. Benson and Stephen H. Bryant and Kathi Canese and Deanna M. Church and Michael DiCuccio and Ron Edgar and Scott Federhen and Wolfgang Helmberg and David L. Kenton and Oleg Khovayko and David J. Lipman and Thomas L. Madden and Donna R. Maglott and James Ostell and Joan U. Pontius and Kim D. Pruitt and Gregory D. Schuler and Lynn M. Schriml and Edwin Sequeira and Stephen T. Sherry and Karl Sirotkin and Grigory Starchenko and Tugba Onal Suzek and Roman L. Tatusov and Tatiana A. Tatusova and Lukas Wagner and Eugene Yaschenko},
  journal={Nucleic acids research},
  volume={28 1},
In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval and resources that operate on the data in GenBank and a variety of other biological data made available through NCBI's Web site. NCBI data retrieval resources include Entrez, PubMed, LocusLink and the Taxonomy Browser. Data analysis resources include BLAST, Electronic PCR, OrfFinder, RefSeq, UniGene, Database of Single Nucleotide… 

Tables from this paper

Genomic resources for chicken

  • P. AntinJ. Konieczka
  • Biology
    Developmental dynamics : an official publication of the American Association of Anatomists
  • 2005
This primer provides an overview of online genomic resources for the chicken, including the Ensembl, UCSC, and NCBI annotated chicken genome browsers; expressed sequence tag and in situ hybridization databases; and sources for microarrays, cDNAs, and bacterial artificial chromosomes.

A Database of Selected Marine Genomics for Retrieving Distantly Related Proteins

The proposed system demonstrated its suitability for similarity comparison of distantly related proteins, and several important protein sequences can be retrieved by MSGD while those well-known residue-based matching methods missed the identification.

Re-annotation of genome microbial CoDing-Sequences: finding new genes and inaccurately annotated genes

A new program is developed that automatically identifies biologically significant candidate genes in a bacterial genome and the accuracy of gene finding was assessed by comparison with existing annotations, revealing that a small but not negligible number of genes annotated within the framework of sequencing projects are likely to be partially inaccurate or plainly wrong.

AsMamDB: an alternative splice database of mammals

The objective of database AsMamDB is to facilitate the systematic study of alternatively spliced genes of mammals and includes gene alternative splicing patterns, gene structures, locations in chromosomes, products of genes and tissues where they express.

PhenoGO: an integrated resource for the multiscale mining of clinical and biological data

The PhenoGO database is significantly extended with gene-disease specific annotations and included an additional ten species to provide phenotypic context for mining existing associations between gene products and GO terms specified in the Gene Ontology Databases.

Detection of signals in mRNAs that influence translation.

The analysis indicates that some organisms with extremely high GC% genomes do not have a strong dependence on base pairing ribosome binding sites, as the complementary sequence is absent from many genes.

An integrated approach to enhancing functional annotation of sequences for data analysis of a transcriptome

The integration of plant databases using the Ondex showed that it was possible to increase the overall quantity and quality of information available, and thereby improve the resulting annotation, and yielded new biological insights into water stress and highlighted potential candidate genes that could be used by breeders to improve drought response.

Homology search for genes

A homology search solution that automates this process, and instead of HSPs returns complete gene structures, and achieves better sensitivity and specificity by adapting a hidden Markov model for gene finding to reflect features of the query gene.

MitoNuc: a database of nuclear genes coding for mitochondrial proteins. Update 2002

MitoNuc, a database containing detailed information on sequenced nuclear genes coding for mitochondrial proteins in Metazoa, and a new field in the database has been defined: the cluster identifier, an alpha numeric code used to identify each cluster of homologous proteins.

Comparison of human (and other) genome browsers

This review describes the basic functionality of genome browsers and compares three of them: the University of California Santa Cruz (UCSC) Genome Browser, the Ensembl Genome browser and the NCBI MapViewer.



NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy

Recent growth, the status of curating the human RefSeq data set, more extensive feature annotation and current policy for eukaryotic genome annotation via the NCBI annotation pipeline are reported on.

The National Center for Biotechnology Information's Protein Clusters Database

The NCBI Protein Clusters Database provides an efficient method to aggregate gene and protein annotation for researchers and is available at http://www.ncbi.nlm.nih.gov/sites/entrez?db=proteinclusters.

The NCBI Taxonomy database

The NCBI Taxonomy database is a central organizing hub for many of the resources at the NCBI, and provides a means for clustering elements within other domains of NCBI web site, for internal linking between domains of the Entrez system and for linking out to taxon-specific external resources on the web.

dbSNP: the NCBI database of genetic variation

The dbSNP database is a general catalog of genome variation to address the large-scale sampling designs required by association studies, gene mapping and evolutionary biology, and is integrated with other sources of information at NCBI such as GenBank, PubMed, LocusLink and the Human Genome Project data.

Entrez Gene: gene-centered information at NCBI

Entrez Gene is a step forward from NCBI's LocusLink, with both a major increase in taxonomic scope and improved access through the many tools associated with NCBI Entrez.

CDD: specific functional annotation with the Conserved Domain Database

NCBI's Conserved Domain Database is a collection of multiple sequence alignments and derived database search models, which represent protein domains conserved in molecular evolution, and provides annotation of domain footprints and conserved functional sites on protein sequences.

NCBI GEO: archive for functional genomics data sets—update

The Gene Expression Omnibus is an international public repository for high-throughput microarray and next-generation sequence functional genomic data sets submitted by the research community and supports archiving of raw data, processed data and metadata which are indexed, cross-linked and searchable.

The mouse genome database (MGD): new features facilitating a model system

The mouse genome database (MGD), the international community database for mouse, provides access to extensive integrated data on the genetics, genomics and biology of the laboratory mouse and is the authoritative source for mouse nomenclature for genes, alleles, and mouse strains, and for GO annotations to mouse genes.

NCBI Epigenomics: a new public resource for exploring epigenomic data sets

The Epigenomics database at the National Center for Biotechnology Information (NCBI) is a new resource that has been created to serve as a comprehensive public resource for whole-genome epigenetic data sets and provides the user with a unique interface that allows for intuitive browsing and searching of data sets based on biological attributes.

The sequence read archive: explosive growth of sequencing data

The content and structure of the SRA is presented and report on updated metadata structures, submission file formats and supported sequencing platforms, and various responses to the challenge of explosive data growth are outlined.