Database resources of the National Center for Biotechnology Information: 2002 update

  David L. Wheeler and Deanna M. Church and Alex E. Lash and Detlef D. Leipe and Thomas L. Madden and Joan U. Pontius and Gregory D. Schuler and Lynn M. Schriml and Tatiana A. Tatusova and Lukas Wagner and Barbara A. Rapp
  Nucleic acids research
  volume={30 1}
In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources that operate on the data in GenBank and a variety of other biological data made available through NCBI's web site. NCBI data retrieval resources include Entrez, PubMed, LocusLink and the Taxonomy Browser. Data analysis resources include BLAST, Electronic PCR, OrfFinder, RefSeq, UniGene, HomoloGene, Database of Single… 
SOURCE: a unified genomic resource of functional annotations, ontologies, and gene expression data
SOURCE provides content both in gene and cDNA clone-centric pages, and thus simplifies analysis of datasets generated using cDNA microarrays, and facilitates statistical analyses such as assessing the enrichment of functional attributes within clusters of genes.
CDD: a curated Entrez database of conserved domain alignments
The Conserved Domain Database (CDD), which mirrors the publicly available domain alignment collections SMART and PFAM, and now also contains alignment models curated at NCBI, is now indexed as a separate database within the Entrez system and linked to other Entrez databases such as MEDLINE(R).
DBHR: a collection of databases relevant to human research
A collection of human research databases on a single platform with six categories: DNA database, RNA database, protein database, expression database, pathway database and disease database to store, organize and share data in a structured and searchable manner.
FIE2: a program for the extraction of genomic DNA sequences around the start and translation initiation site of human genes
The authors are not aware of any publicly available web-based tool that uses the human genomic sequence to extract pertinent promoter- and TIS-region information in this fashion, and FIE2 is freely available at
GALA, a database for genomic sequence alignments and annotations.
A relational database to contain whole genome sequence alignments between human and mouse with extensive annotations of the human sequence, which can reveal a wide variety of relationships.
Yeast genomic databases and the challenge of the post-genomic era
The results indicate that post-genomic technologies are providing rich new information for nearly all yeast genes, but data from these experiments is scattered across many Web sites and the results fromThese experiments are poorly integrated with other forms of yeast knowledge.
iMap: a database-driven utility to integrate and access the genetic and physical maps of maize
iMap is a relational database that contains integrated information produced by applying a set of anchoring rules to assign BAC contigs to specific locations on the genetic map, and a map graphic browser and search utility that allow viewing and retrieving many types of genetic and physical map data.
Gene Indexing: Characterization and Analysis of NLM's GeneRIFs
A prototype functional alerting system for researchers based on the GeneRIFs, and a strategy to find all of the literature related to genes, are developed.
GeneHuggers: database mining and application connectivity tools for subsequence analyses of the human genome
GeneHuggers provides functionality to the UNIX operating system that allows customized bioinformatics program development that enables precise selection of subsequence regions from records of the RefSeq human genome database.


dbSNP: the NCBI database of genetic variation
The dbSNP database is a general catalog of genome variation to address the large-scale sampling designs required by association studies, gene mapping and evolutionary biology, and is integrated with other sources of information at NCBI such as GenBank, PubMed, LocusLink and the Human Genome Project data.
dbSNP: a database of single nucleotide polymorphisms
In response to a need for a general catalog of genome variation to address the large-scale sampling designs required by association studies, gene mapping and evolutionary biology, the National Cancer
BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences.
'BLAST 2 Sequences', a new BLAST-based tool for aligning two protein or nucleotide sequences, is described, utilizing the BLAST algorithm for pairwise DNA-DNA or protein-protein sequence comparison.
RefSeq and LocusLink: NCBI gene-centered resources
Together, RefSeq and LocusLink provide a non-redundant view of genes and other loci to support research on genes and gene families, variation, gene expression and genome annotation.
Complete genomes in WWW Entrez: data representation and analysis
Flexible web based views, precomputed relationships, and immediate access to analytical tools provide scientists with a portal into the new insights to be gained from completed genome sequences.
The Protein Information Resource (PIR)
The Protein Information Resource (PIR) produces the largest, most comprehensive, annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database, in
The COG database: a tool for genome-scale analysis of protein functions and evolution
The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes.
Entrez: molecular biology database and retrieval system.
Gene Expression Omnibus: NCBI gene expression and hybridization array data repository
The Gene Expression Omnibus (GEO) project was initiated in response to the growing demand for a public repository for high-throughput gene expression data. GEO provides a flexible and open design
KEGG: Kyoto Encyclopedia of Genes and Genomes
KEGG maintains the GENES database for the gene catalogs of all organisms with complete genomes and selected organisms with partial genomes, which are continuously re-annotated, as well as the LIGAND database for chemical compounds and enzymes.