The Pfam protein families database

@article{Bateman2004ThePP,
  title={The Pfam protein families database},
  author={Alex Bateman and Lachlan James M. Coin and Richard Durbin and Robert D. Finn and Volker Hollich and Sam Griffiths-Jones and Ajay Khanna and Mhairi Marshall and Simon Moxon and Erik L. L. Sonnhammer and David J. Studholme and Corin A. Yeats and Sean R. Eddy},
  journal={Nucleic acids research},
  year={2004},
  volume={28 1},
  pages={
          263-6
        }
}
Pfam is a large collection of protein multiple sequence alignments and profile hidden Markov models. Pfam is available on the WWW in the UK at http://www.sanger.ac.uk/Software/Pfam/, in Sweden at http://www.cgr.ki.se/Pfam/ and in the US at http://pfam.wustl.edu/. The latest version (4.3) of Pfam contains 1815 families. These Pfam families match 63% of proteins in SWISS-PROT 37 and TrEMBL 9. For complete genomes Pfam currently matches up to half of the proteins. Genomic DNA can be directly… 

Figures and Tables from this paper

EXProt: a database for proteins with an experimentally verified function
TLDR
In EXProt release 2.0, entries from the Pseudomonas aeruginosa community annotation project, the Escherichia coli genome and proteome database and the translated coding sequences from the Prokaryotes division of EMBL nucleotide sequence database are described as having an experimentally verified function.
GTOP: a database of protein structures predicted from genome sequences
TLDR
The Genomes TO Protein structures and functions (GTOP) database is constructed, containing protein fold predictions of a huge number of sequences, mainly carried out with the homology search program PSI-BLAST, currently the most popular among high-sensitivity profile search methods.
TranScout: prediction of gene expression regulatory proteins from their sequences
TLDR
A program ('TranScout') has been developed for the detection and evaluation of conserved motifs in prokaryotic and eukaryotic sequences of proteins with a gene regulatory function and the efficiency of the program is shown in a benchmark against a database obtained from SWISS-PROT without the protein sequences used to train the program.
A Versatile Structural Domain Analysis Server Using Profile Weight Matrices
TLDR
The WEB tool "AnDom" assigns to a given protein sequence all experimentally determined structural domains contained within it, including multidomain and large proteins, allowing numerous applications for structural genomics including investigation of complex eucaryotic protein families.
FIGfams: yet another set of protein families
TLDR
This work presents FIGfams, a new collection of over 100 000 protein families that are the product of manual curation and close strain comparison, and Associated with each FIGfam is a two-tiered, rapid, accurate decision procedure to determine family membership for new proteins.
The PredictProtein server
PredictProtein (PP, http://cubic.bioc.columbia.edu/pp/) is an internet service for sequence analysis and the prediction of aspects of protein structure and function. Users submit protein sequence or
Safe Functional Inference for Uncharacterized Viral Proteins
TLDR
The ProtoNet resource is used to develop a methodology for a consistent and safe functional inference for remote families, and a new clustering scheme is provided based on direct clustering of all detectable sequence similarities.
The PROF_PAT Protein Pattern Database: Assessment of Efficiency
TLDR
Analysis through the Internet of 20 amino acid sequences having no descriptions in the TrEMBL database demonstrated that PROF_PAT, being highly competitive with its counterparts in specificity, surpasses them in amplitude and variety of proteins, working several times as fast.
SUPFAM - a database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: implications for structural genomics and function annotation in genomes
TLDR
Expansion of groups of related proteins of yet unknown structural information, as proposed in SUPFAM, should help in identifying 'priority proteins' for structure determination in structural genomics initiatives to expand the coverage of structural information in the protein sequence space.
A structure-based method for protein sequence alignment
MOTIVATION With the continuing rapid growth of protein sequence data, protein sequence comparison methods have become the most widely used tools of bioinformatics. Among these methods are those that
...
...

References

SHOWING 1-10 OF 40 REFERENCES
Pfam: clans, web tools and services
TLDR
Improvements to the range of Pfam web tools and the first set of PfAm web services that allow programmatic access to the database and associated tools are presented.
ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons
TLDR
ProDom contains all protein domain families automatically generated from the SWISS-PROT and TrEMBL sequence databases and results from a similar domain analysis as applied to completed genomes.
iPfam: visualization of protein?Cprotein interactions in PDB at domain and amino acid resolutions
TLDR
A web resource is implemented that allows the investigation of protein interactions in the Protein Data Bank structures at the level of Pfam domains and amino acid residues.
Pfam: A comprehensive database of protein domain families based on seed alignments
TLDR
A database based on hidden Markov model profiles (HMMs), which combines high quality and completeness, and a large number of previously unannotated proteins from the Caenorhabditis elegans genome project were classified.
The ProDom database of protein domain families: more emphasis on 3D
TLDR
ProDom-SG, a ProDom-based server dedicated to the selection of candidate proteins for structural genomics, has been developed.
Protein homology detection by HMM?CHMM comparison
TLDR
A method for detecting distant homologous relationships between proteins based on the generalized alignment of protein sequences with a profile hidden Markov model (HMM) to the case of pairwise alignment of profile HMMs is presented.
The Protein Data Bank
TLDR
The goals of the PDB are described, the systems in place for data deposition and access, how to obtain further information and plans for the future development of the resource are described.
ADDA: a domain database with global coverage of the protein universe
TLDR
A database of protein domain families with complete coverage of all protein sequences and 3828 novel domain families that do not overlap with the curated domain databases Pfam, SCOP and InterPro are generated.
The EMBL Nucleotide Sequence Database
TLDR
Changes over the past year include the removal of the sequence length limit, the launch of the EMBLCDSs dataset, extension of the Sequence Version Archive functionality and the revision of quality rules for TPA data.
...
...