A guide to UniProt for protein scientists.

@article{ODonovan2011AGT,
  title={A guide to UniProt for protein scientists.},
  author={Claire O’Donovan and Rolf Apweiler},
  journal={Methods in molecular biology},
  year={2011},
  volume={694},
  pages={
          25-35
        }
}
One of the essential requirements of the proteomics community is a high quality annotated nonredundant protein sequence database with stable identifiers and an archival service to enable protein identification and characterization. The scope of this chapter is to illustrate how Universal Protein Resource (UniProt) (The UniProt Consortium, Nucleic Acids Res. 38:D142-D148, 2010) can be best utilized for proteomics purposes with a particular focus on exploiting the knowledge captured in the… 
Finding Sequences for over 270 Orphan Enzymes
TLDR
Using this method, over 270 orphan enzymes were reconnected with their corresponding sequence, and this success points toward how to systematically eliminate the remaining orphan enzymes and prevent the introduction of future orphan enzymes.
Functional inference by ProtoNet family tree: the uncharacterized proteome of Daphnia pulex
TLDR
The scaffold of ProtoNet can be used as an alignment-free protocol for large-scale annotation task of uncharacterized proteomes and gene amplification as a leading strategy of the Daphnia in coping with environmental toxicity is focused on.
The Potential Cost of High-Throughput Proteomics
TLDR
Improved strategies for data validation need to be implemented, along with a change in the culture of high-throughput proteomics, linking proteomics closer to biology.
First survey and functional annotation of prohormone and convertase genes in the pig
TLDR
The present genomic and functional characterization supports the use of the pig as an effective animal model to gain a deeper understanding of prohormones, Prohormone convertases and neuropeptides in biomedical and agricultural research.
Transcriptome and long noncoding RNA sequencing of three extracellular vesicle subtypes released from the human colon cancer LIM1863 cell line
TLDR
The data reveal several potential lncRNA CRC biomarkers and novel splicing/fusion genes that, collectively, will advance the understanding of EV biology in CRC and accelerate the development of EV-based diagnostics and therapeutics.
iTRAQ-Based Comparative Proteomic Analysis of Acinetobacter baylyi ADP1 Under DNA Damage in Relation to Different Carbon Sources
TLDR
It is revealed that DNA damage response in A. baylyi ADP1 at the translational level is significantly altered by carbon source, providing an insight into the complex protein interactions across carbon sources and offering theoretical clues for further study to elucidate their general regulatory mechanism to adapt to different nutrient environments.
Disulfide Connectivity Prediction Based on Modelled Protein 3D Structural Information and Random Forest Regression
TLDR
A new feature extracted from the predicted protein 3D structural information is proposed and integrated with traditional features to form discriminative features and a random forest regression model is performed to predict protein disulfide connectivity.
...
...

References

SHOWING 1-10 OF 15 REFERENCES
The International Protein Index: An integrated database for proteomics experiments
TLDR
IPI (the International Protein Index) has been developed and offers complete nonredundant data sets representing the human, mouse and rat proteomes, built from the Swiss‐Prot, TrEMBL, Ensembl and RefSeq databases.
The Universal Protein Resource (UniProt) in 2010
The primary mission of UniProt is to support biological research by maintaining a stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with
A novel method for automatic functional annotation of proteins
TLDR
A method of automatic annotation that produces highly reliable functional prediction using the language and the syntax of SWISS-PROT is developed and successfully used for the automatic annotation of a testset of unknown proteins.
UniProt archive
TLDR
UniProt Archive (UniParc) is the most comprehensive, non-redundant protein sequence database available and contains only protein sequences and database cross-references; all other information must be retrieved from the source databases.
Gene Ontology: tool for the unification of biology
TLDR
The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.
Large‐scale, classification‐driven, rule‐based functional annotation of proteins
TLDR
Rule-based annotation leads to facile, accurate prediction and functional inference for uncharacterized proteins, allows systematic detection of genome annotation errors, and provides sensible propagation and standardization of protein annotation.
Ensembl 2008
TLDR
Major additions and improvements to Ensembl since the previous report include extensive support forfunctional genomics data in the form of a specialized functional genomics database, genome-wide maps of protein–DNA interactions and the EnsembL regulatory build; support for customization of the Ensemble web interface through the addition of user accounts and user groups; and increased support for genome resequencing.
VARSPLIC: alternatively-spliced protein sequences derived from SWISS-PROT and TrEMBL
TLDR
The program varsplic.pl uses information present in the SWISS-PROT and TrEMBL databases to create new records for alternatively spliced isoforms that can be used in similarity searches.
UniRef: comprehensive and non-redundant UniProt reference clusters
TLDR
The UniRef (UniProt Reference Clusters) provides clustered sets of sequences from the UniProt Knowledgebase and selected UniProt Archive records to obtain complete coverage of sequence space at several resolutions while hiding redundant sequences.
...
...