Database resources of the National Center for Biotechnology Information

@article{Sayers2020DatabaseRO,
  title={Database resources of the National Center for Biotechnology Information},
  author={Eric W. Sayers and Jeff Beck and James Rodney Brister and Evan E. Bolton and Kathi Canese and Donald C. Comeau and Kathryn Funk and Anne Ketter and Sunghwan Kim and Avi Kimchi and Paul A. Kitts and Anatoliy Kuznetsov and Stacy Lathrop and Zhiyong Lu and Kelly M. McGarvey and Thomas L. Madden and Terence D. Murphy and Nuala A. O'Leary and Lon Phan and Valerie A. Schneider and Françoise Thibaud-Nissen and Barton W. Trawick and Kim D. Pruitt and James Ostell},
  journal={Nucleic Acids Research},
  year={2020},
  volume={39},
  pages={D38 - D51}
}
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Electronic PCR, OrfFinder, Splign, ProSplign… 

Tables from this paper

Genome Warehouse: A Public Repository Housing Genome-scale Data
The Genome Warehouse (GWH) is a public repository housing genome assembly data for a wide range of species and delivering a series of web services for genome data submission, storage, release, and
BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database
TLDR
In comparison with BRAKER1 supported by a large volume of transcript data, BRAKER2 could produce a better gene prediction accuracy if the evolutionary distances to the reference species in the protein database were rather small.
CIGene: a literature-based online resource for cancer initiation genes
TLDR
It was found that 32 of the 96 genes with mutations in breast cancers were significantly associated with patient survival, and CIGene, the first literature-based online resource for CIGs, will serve as a useful gateway for the systematic analysis of cancer initiation.
Taxonomy annotation and guide tree errors in 16S rRNA databases
TLDR
The branching orders of the Greengenes and SILVA guide trees were found to disagree at comparable rates with each other and with taxonomy annotations according to the training set provided by RDP, indicating that the trees have comparable quality.
Genomic, transcriptomic, and structural analysis of Pseudomonas virus PA5oct highlights the molecular complexity among Jumbo phages
TLDR
Although the temporal regulation of the PA5oct genome expression reveals specific genome clusters expressed in early and late infection, many genes encoding experimentally observed structural proteins surprisingly appear to remain almost untranscribed throughout the infection cycle.
Understanding the molecular evolution of tiger diversity through DNA barcoding marker ND4 and NADH dehydrogenase complex using computational biology.
TLDR
The nucleotide composition and nucleotide distribution pattern of tiger ND genes showed the evolutionary pattern and origin of tiger and Panthera lineage concerning the molecular clock, which will help to understand their adaptive evolution.
GOAT: Genetic Output Analysis Tool: An open source GWAS and genomic region visualization tool
TLDR
This paper outlines some of the GOAT's leading features and characteristics and compares them to existing open source GWAS visualization tools such as Locus Zoom and the Integrative Genomics Viewer and presents future development plans for GOAT.
DAIRYdb: a manually curated reference database for improved taxonomy annotation of 16S rRNA gene sequences from dairy products
TLDR
The manually curated DAIRYdb strongly improves taxonomic annotation accuracy for microbiome studies in dairy environments and significantly outperformed all other databases independently of the classification algorithm by enabling higher accurate taxonomy annotation down to the species rank.
Transposable element discovery and characterization of LTR-retrotransposon evolutionary lineages in the tropical fruit species Passiflora edulis
TLDR
Functional analyses disclosed that the Angela, Del, CRM and Tork lineages are conserved in wild Passiflora species, supporting the idea of a common expansion of Copia and Gypsy superfamilies and lends weight to the suggestion that LTR-RTs had a recent expansion into the analyzed gene-rich region of the P. edulis draft genome.
...
...

References

SHOWING 1-10 OF 87 REFERENCES
Entrez Gene: gene-centered information at NCBI
TLDR
The content of Entrez Gene represents the result of both curation and automated integration of data from NCBI's Reference Sequence project (RefSeq), from collaborating model organism databases and from other databases within NCBI.
GenBank
TLDR
GenBank® is a comprehensive database that contains publicly available nucleotide sequences for over 340 000 formally described species and integrates these records with a variety of other data including taxonomy nodes, genomes, protein structures, and biomedical journal literature in PubMed.
Clone DB: an integrated NCBI resource for clone-associated data
TLDR
The National Center for Biotechnology Information's Clone DB is an integrated resource providing information about and facilitating access to clones, which serve as valuable research reagents in many fields, including genome sequencing and variation analysis.
The NCBI Taxonomy database
TLDR
The NCBI Taxonomy database is a central organizing hub for many of the resources at the NCBI, and provides a means for clustering elements within other domains of NCBI web site, for internal linking between domains of the Entrez system and for linking out to taxon-specific external resources on the web.
NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy
TLDR
Recent growth, the status of curating the human RefSeq data set, more extensive feature annotation and current policy for eukaryotic genome annotation via the NCBI annotation pipeline are reported on.
dbSNP: the NCBI database of genetic variation
TLDR
The dbSNP database is a general catalog of genome variation to address the large-scale sampling designs required by association studies, gene mapping and evolutionary biology, and is integrated with other sources of information at NCBI such as GenBank, PubMed, LocusLink and the Human Genome Project data.
The National Center for Biotechnology Information's Protein Clusters Database
TLDR
The NCBI Protein Clusters Database provides an efficient method to aggregate gene and protein annotation for researchers and is available at http://www.ncbi.nlm.nih.gov/sites/entrez?db=proteinclusters.
CDD: specific functional annotation with the Conserved Domain Database
TLDR
NCBI's Conserved Domain Database is a collection of multiple sequence alignments and derived database search models, which represent protein domains conserved in molecular evolution, and provides annotation of domain footprints and conserved functional sites on protein sequences.
NCBI GEO: archive for high-throughput functional genomic data
TLDR
The Gene Expression Omnibus at the National Center for Biotechnology Information (NCBI) is the largest public repository for high-throughput gene expression data and offers many tools and features that allow users to effectively explore, analyze and download expression data from both gene-centric and experiment-centric perspectives.
NCBI Reference Sequences: current status, policy and new initiatives
TLDR
The recent growth of the RefSeq database, recent changes to feature annotations and record types for eukaryotic (primarily vertebrate) species and policies regarding species inclusion and genome annotation are reported on.
...
...