The TIGRFAMs database of protein families

  title={The TIGRFAMs database of protein families},
  author={Daniel H. Haft and Jeremy D. Selengut and Owen White},
  journal={Nucleic acids research},
  volume={31 1},
TIGRFAMs is a collection of manually curated protein families consisting of hidden Markov models (HMMs), multiple sequence alignments, commentary, Gene Ontology (GO) assignments, literature references and pointers to related TIGRFAMs, Pfam and InterPro models. These models are designed to support both automated and manually curated annotation of genomes. TIGRFAMs contains models of full-length proteins and shorter regions at the levels of superfamilies, subfamilies and equivalogs, where… 

Figures from this paper

TIGRFAMs and Genome Properties in 2013
The Genome Properties database specifies how computed evidence, including TIGRFAMs HMM results, should be used to judge whether an enzymatic pathway, a protein complex or another type of molecular subsystem is encoded in a genome.
TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes
The TIGRFAMs and Genome Properties systems are described, which are a collection of protein family definitions built to aid in high-throughput annotation of specific protein functions and a generator of phylogenetic profiles, through which new protein family functions may be discovered.
SUPERFAMILY—sophisticated comparative genomics, data mining, visualization and phylogeny
SUPERFAMILY provides structural, functional and evolutionary information for proteins from all completely sequenced genomes, and large sequence collections such as UniProt and recent extensions to the database include InterPro abstracts and Gene Ontology terms for superfamiles.
Automated hierarchical classification of protein domain subfamilies based on functionally-divergent residue signatures
This approach automates the rapid creation of protein domain hierarchies and thus will eliminate one of the most time consuming aspects of conserved domain database curation, and also facilitates protein domain annotation by identifying those pattern residues that most distinguish each protein domain subgroup from other related subgroups.
InterPro protein classification.
This chapter reviews the signature methods found in the InterPro database, and provides an overview of the Inter pro resource itself.
PANTHER: a library of protein families and subfamilies indexed by function.
The PANTHER/X ontology is used to give a high-level representation of gene function across the human and mouse genomes, and the family HMMs are used to rank missense single nucleotide polymorphisms (SNPs) according to their likelihood of affecting protein function.
Seqrutinator: Non-Functional Homologue Sequence Scrutiny for the Generation of large Datatsets for Protein Superfamily Analysis
Seqrutinator forms a consistent pipeline for sequence scrutiny that does result in sequence sets that generate high fidelity MSAs, and the three superfamilies furthermore show similar scrutiny patterns.
GOTrees: Predicting GO Associations from Protein Domain Composition Using Decision Trees
The method is more sensitive when compared to the InterPro2GO performance and suffers only some precision decrease, and improved the sensitivity by 22%, 27% and 50% for Molecular Function, Biological Process and Cellular GO terms respectively.
GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains
It is shown how GeMMA clusters can help to predict the impact of experimentally determining a protein domain structure on comparative protein modelling coverage, in the context of structural genomics.
InterPro and InterProScan
This chapter will describe how to use InterPro and InterProScan for protein sequence classification and comparative proteomics.


TIGRFAMs: a protein family resource for the functional identification of proteins
The term 'equivalog' is introduced to describe members of a set of homologous proteins that are conserved with respect to function since their last common ancestor to support the automated functional identification of proteins by sequence homology.
HMM-based databases in InterPro
This paper reviews the Pfam, TIGRFAMs and SMART databases that use the profile-HMMs provided by the HMMER package to find hidden Markov models used for protein evolution and function detection.
Gene Ontology: tool for the unification of biology
The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.
Distinguishing homologous from analogous proteins.
This work provides a means by which it is possible to determine whether two groups of related proteins have a common ancestor or are of independent origin, and how many nucleotide positions must differ in the genes encoding the two presumptively homologous proteins.
Identification of genes that are associated with DNA repeats in prokaryotes
A novel family of repetitive DNA sequences that is present among both domains of the prokaryotes but absent from eukaryotes or viruses is studied, characterized by direct repeats, varying in size from 21 to 37 bp, interspaced by similarly sized non‐repetitive sequences.
Evidence for lateral gene transfer between Archaea and Bacteria from genome sequence of Thermotoga maritima
Genome analysis reveals numerous pathways involved in degradation of sugars and plant polysaccharides, and 108 genes that have orthologues only in the genomes of other thermophilic Eubacteria and Archaea.
Complete genome sequence of enterohemorrhagic Escherichia coli O157:H7 and genomic comparison with a laboratory strain K-12.
The complete chromosome sequence of an O157:H7 strain isolated from the Sakai outbreak is reported, and the results of genomic comparison with a benign laboratory strain, K-12 MG1655, are identified, which may represent the fundamental backbone of the E. coli chromosome.
Profile hidden Markov models
  • S. Eddy
  • Computer Science
  • 1998
Profile HMM methods performed comparably to threading methods in the CASP2 structure prediction exercise and complement standard pairwise comparison methods for large-scale sequence analysis.