The Pfam protein families database

@article{Finn2010ThePP,
  title={The Pfam protein families database},
  author={Robert D. Finn and Jaina Mistry and John G. Tate and Penny C. Coggill and Andreas Heger and Joanne E. Pollington and O. Luke Gavin and Prasad Gunasekaran and Goran Ceric and Kristoffer Forslund and Liisa Holm and Erik L. L. Sonnhammer and Sean R. Eddy and Alex Bateman},
  journal={Nucleic Acids Research},
  year={2010},
  volume={40},
  pages={D290 - D301}
}
Pfam is a widely used database of protein families, currently containing more than 13 000 manually curated protein families as of release 26.0. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/). Here, we report on changes that have occurred since our 2010 NAR paper (release 24.0). Over the last 2 years, we have generated 1840 new families and increased coverage of the UniProt Knowledgebase (UniProtKB) to… 

Figures and Tables from this paper

Pfam: the protein families database
Pfam, available via servers in the UK (http://pfam.sanger.ac.uk/) and the USA (http://pfam.janelia.org/), is a widely used database of protein families, containing 14 831 manually curated entries in
InterPro: the integrative protein signature database
TLDR
The InterPro database integrates together predictive models or ‘signatures’ representing protein domains, families and functional sites from multiple, diverse source databases: Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY and TIGRFAMs.
The challenge of increasing Pfam coverage of the human proteome
TLDR
A major focus for increasing Pfam coverage of the human proteome will be to improve the definition of existing families, suggesting that thousands of new families would need to be generated to cover them.
DPCfam: a new method for unsupervised protein family classification
TLDR
DPCfam is introduced, a new unsupervised procedure that uses sequence alignments and Density Peak Clustering to automatically classify homologous protein regions and shows potential both for assisting manual annotation efforts and for stand-alone classification of sparsely annotated protein datasets such as those from environmental metagenomics studies.
InterPro in 2011: new developments in the family and domain prediction database
TLDR
An overview of new developments in the InterPro database and its associated software since 2009 is given, including updates to database content, curation processes and Web and programmatic interfaces.
InterPro protein classification.
TLDR
This chapter reviews the signature methods found in the InterPro database, and provides an overview of the Inter pro resource itself.
SUPERFAMILY 1.75 including a domain-centric gene ontology method
The SUPERFAMILY resource provides protein domain assignments at the structural classification of protein (SCOP) superfamily level for over 1400 completely sequenced genomes, over 120 metagenomes and
Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies
TLDR
The results suggest that misannotation in enzyme superfamilies containing multiple families that catalyze different reactions is a larger problem than has been recognized and strategies are suggested for addressing some of the systematic problems contributing to these high levels of misannation.
Distribution and cluster analysis of predicted intrinsically disordered protein Pfam domains
TLDR
It is found that predicted intrinsic disorder (PID) is not always conserved across Pfam domains, and it is hypothesized that grouping sets into shorter sequences with more uniform length will reveal more information about intrinsic disorder and lead to more finely structured and perhaps more accurate predictions.
Gene3D: merging structure and function for a Thousand genomes
TLDR
Gene3D provides accurate structural domain family assignments for over 1100 genomes and nearly 10 000 000 proteins, and provides a set of services, including an interactive genome coverage graph visualizer, DAS annotation resources, sequence search facilities and SOAP services.
...
...

References

SHOWING 1-10 OF 111 REFERENCES
Pfam: the protein families database
Pfam, available via servers in the UK (http://pfam.sanger.ac.uk/) and the USA (http://pfam.janelia.org/), is a widely used database of protein families, containing 14 831 manually curated entries in
InterPro: the integrative protein signature database
TLDR
The InterPro database integrates together predictive models or ‘signatures’ representing protein domains, families and functional sites from multiple, diverse source databases: Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY and TIGRFAMs.
Pfam 10 years on: 10 000 families and still growing
TLDR
It is shown that as more sequences are added to the sequence databases the fraction of sequences that Pfam matches is reduced, suggesting that continued addition of new families is essential to maintain its relevance.
Pfam: clans, web tools and services
TLDR
Improvements to the range of Pfam web tools and the first set of PfAm web services that allow programmatic access to the database and associated tools are presented.
iPfam: visualization of protein?Cprotein interactions in PDB at domain and amino acid resolutions
TLDR
A web resource is implemented that allows the investigation of protein interactions in the Protein Data Bank structures at the level of Pfam domains and amino acid residues.
Pfam: multiple sequence alignments and HMM-profiles of protein domains
TLDR
Pfam 2.0 matches one or more domains in 50% of Swissprot-34 sequences, and 25% of a large sample of predicted proteins from the Caenorhabditis elegans genome.
UniProt: the Universal Protein knowledgebase
TLDR
The Swiss-Prot, TrEMBL and PIR protein database activities have united to form the Universal Protein Knowledgebase (UniProt), which is to provide a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and query interfaces.
Pfam: A comprehensive database of protein domain families based on seed alignments
TLDR
A database based on hidden Markov model profiles (HMMs), which combines high quality and completeness, and a large number of previously unannotated proteins from the Caenorhabditis elegans genome project were classified.
Identifying Protein Domains with the Pfam Database
TLDR
This unit contains detailed information on how to access and utilize the information present in the Pfam database, namely the families, multiple alignments, and annotation.
Predicting active site residue annotations in the Pfam database
TLDR
A strict set of rules are developed, chosen to reduce the rate of false positives, which enable the transfer of experimentally determined active site residue data to other sequences within the same Pfam family, which provides one of the largest available databases of active site annotation.
...
...