The Pfam protein families database.

@article{Bateman2004ThePP,
  title={The Pfam protein families database.},
  author={Alex Bateman and Lachlan James M. Coin and Richard Durbin and Robert D. Finn and Volker Hollich and Sam Griffiths-Jones and Ajay Khanna and Mhairi Marshall and Simon Moxon and Erik L. L. Sonnhammer and David J. Studholme and Corin A. Yeats and Sean R. Eddy},
  journal={Nucleic acids research},
  year={2004},
  volume={32 Database issue},
  pages={
          D138-41
        }
}
Pfam is a large collection of protein families and domains. Over the past 2 years the number of families in Pfam has doubled and now stands at 6190 (version 10.0). Methodology improvements for searching the Pfam collection locally as well as via the web are described. Other recent innovations include modelling of discontinuous domains allowing Pfam domain definitions to be closer to those found in structure databases. Pfam is available on the web in the UK (http://www.sanger.ac.uk/Software/Pfam… 

Figures and Tables from this paper

Pfam: the protein families database
Pfam, available via servers in the UK (http://pfam.sanger.ac.uk/) and the USA (http://pfam.janelia.org/), is a widely used database of protein families, containing 14 831 manually curated entries in
TheViral MetaGenome Annotation Pipeline(VMGAP):an automated tool for the functional annotation of viral Metagenomic shotgun sequencing data
TLDR
The Viral MetaGenome Annotation Pipeline (VMGAP) pipeline takes advantage of a number of specialized databases, such as collections of mobile genetic elements and environmental metagenomes to improve the classification and functional prediction of viral gene products.
DASMI: exchanging, annotating and assessing molecular interaction data
TLDR
The DASMI system for the dynamic exchange, annotation and assessment of molecular interaction data is introduced and it affords the online retrieval of the most recent data from distributed sources and databases.
SynerClust: a highly scalable, synteny-aware orthologue clustering tool
TLDR
SynerClust was designed to analyse genomes with high levels of local synteny, particularly prokaryotes, which have operon structure and is able to more completely identify sets of core genes for datasets that included diverse strains, while using substantially less memory, and with scalability comparable to the fastest tools.
Extensive DNA mimicry by the ArdA anti-restriction protein and its role in the spread of antibiotic resistance
TLDR
The structure of the ArdA protein from the conjugative transposon Tn916 is solved and it is found that it has a novel extremely elongated curved cylindrical structure with defined helical grooves, explaining how ArdA can bind and inhibit the Type I restriction enzymes.
Inspecting abundantly expressed genes in male strobili in sugi (Cryptomeria japonica D. Don) via a highly accurate cDNA assembly
TLDR
A three stages assembling workflow using the de novo transcriptome assembly tools, Oases and Trinity, is designed and it is demonstrated that the transcriptomeAssembly output is valuable and useful for further studies in functional genomics and evolutionary biology.
Trypanosoma cruzi iron superoxide dismutases: insights from phylogenetics to chemotherapeutic target assessment
TLDR
The results suggest that T. cruzi FeSOD types are members of distinct families, and support the hypothesis that gene duplication followed by divergence shaped the evolution of T.cruziFeSODs, and provides a successful approach to the study of gene/protein families as potential drug targets.
Genome-wide identification and expression analysis of the ERF transcription factor family in pineapple (Ananas comosus (L.) Merr.)
TLDR
Synteny and cis-elements analysis of ERF genes provided deep insight into the evolution and function of pineapple ERFs, and results provide useful information for further investigating the evolved and functions of ERFs family in pineapple.
Analysis of the bovine rumen microbiome reveals a diversity of Sus-like polysaccharide utilization loci from the bacterial phylum Bacteroidetes
TLDR
It is suggested that Sus-like systems represent an important mechanism for degradation of a range of plant-derived glycans in ruminants.
...
...

References

SHOWING 1-10 OF 41 REFERENCES
Pfam: clans, web tools and services
TLDR
Improvements to the range of Pfam web tools and the first set of PfAm web services that allow programmatic access to the database and associated tools are presented.
iPfam: visualization of protein?Cprotein interactions in PDB at domain and amino acid resolutions
TLDR
A web resource is implemented that allows the investigation of protein interactions in the Protein Data Bank structures at the level of Pfam domains and amino acid residues.
ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons
TLDR
ProDom contains all protein domain families automatically generated from the SWISS-PROT and TrEMBL sequence databases and results from a similar domain analysis as applied to completed genomes.
Pfam: A comprehensive database of protein domain families based on seed alignments
TLDR
A database based on hidden Markov model profiles (HMMs), which combines high quality and completeness, and a large number of previously unannotated proteins from the Caenorhabditis elegans genome project were classified.
ADDA: a domain database with global coverage of the protein universe
TLDR
A database of protein domain families with complete coverage of all protein sequences and 3828 novel domain families that do not overlap with the curated domain databases Pfam, SCOP and InterPro are generated.
The ProDom database of protein domain families: more emphasis on 3D
TLDR
ProDom-SG, a ProDom-based server dedicated to the selection of candidate proteins for structural genomics, has been developed.
SCOP database in 2004: refinements integrate structure and sequence family data
TLDR
A refinement of the SCOP classification is initiated, which introduces a number of changes mostly at the levels below superfamily, and modernization of the interface capabilities of SCOP allowing more dynamic links with other databases is started.
The Protein Data Bank
TLDR
The goals of the PDB are described, the systems in place for data deposition and access, how to obtain further information and plans for the future development of the resource are described.
Enhanced protein domain discovery using taxonomy
TLDR
By incorporating the understanding of the taxonomic distribution of specific protein domains, the method can enhance domain recognition in protein sequences and incorporate other context-specific domain distributions – such as domain co-occurrence and protein localisation.
...
...