The TIGRFAMs database of protein families
@article{Haft2003TheTD, title={The TIGRFAMs database of protein families}, author={Daniel H. Haft and Jeremy D. Selengut and Owen White}, journal={Nucleic acids research}, year={2003}, volume={31 1}, pages={ 371-3 } }
TIGRFAMs is a collection of manually curated protein families consisting of hidden Markov models (HMMs), multiple sequence alignments, commentary, Gene Ontology (GO) assignments, literature references and pointers to related TIGRFAMs, Pfam and InterPro models. These models are designed to support both automated and manually curated annotation of genomes. TIGRFAMs contains models of full-length proteins and shorter regions at the levels of superfamilies, subfamilies and equivalogs, where…
803 Citations
TIGRFAMs and Genome Properties in 2013
- BiologyNucleic Acids Res.
- 2013
The Genome Properties database specifies how computed evidence, including TIGRFAMs HMM results, should be used to judge whether an enzymatic pathway, a protein complex or another type of molecular subsystem is encoded in a genome.
TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes
- BiologyNucleic Acids Res.
- 2007
The TIGRFAMs and Genome Properties systems are described, which are a collection of protein family definitions built to aid in high-throughput annotation of specific protein functions and a generator of phylogenetic profiles, through which new protein family functions may be discovered.
SUPERFAMILY—sophisticated comparative genomics, data mining, visualization and phylogeny
- BiologyNucleic Acids Res.
- 2009
SUPERFAMILY provides structural, functional and evolutionary information for proteins from all completely sequenced genomes, and large sequence collections such as UniProt and recent extensions to the database include InterPro abstracts and Gene Ontology terms for superfamiles.
Automated hierarchical classification of protein domain subfamilies based on functionally-divergent residue signatures
- Biology, Computer ScienceBMC Bioinformatics
- 2012
This approach automates the rapid creation of protein domain hierarchies and thus will eliminate one of the most time consuming aspects of conserved domain database curation, and also facilitates protein domain annotation by identifying those pattern residues that most distinguish each protein domain subgroup from other related subgroups.
InterPro protein classification.
- BiologyMethods in molecular biology
- 2011
This chapter reviews the signature methods found in the InterPro database, and provides an overview of the Inter pro resource itself.
PANTHER: a library of protein families and subfamilies indexed by function.
- BiologyGenome research
- 2003
The PANTHER/X ontology is used to give a high-level representation of gene function across the human and mouse genomes, and the family HMMs are used to rank missense single nucleotide polymorphisms (SNPs) according to their likelihood of affecting protein function.
Seqrutinator: Non-Functional Homologue Sequence Scrutiny for the Generation of large Datatsets for Protein Superfamily Analysis
- BiologybioRxiv
- 2022
Seqrutinator forms a consistent pipeline for sequence scrutiny that does result in sequence sets that generate high fidelity MSAs, and the three superfamilies furthermore show similar scrutiny patterns.
GOTrees: Predicting GO Associations from Protein Domain Composition Using Decision Trees
- Computer SciencePacific Symposium on Biocomputing
- 2005
The method is more sensitive when compared to the InterPro2GO performance and suffers only some precision decrease, and improved the sensitivity by 22%, 27% and 50% for Molecular Function, Biological Process and Cellular GO terms respectively.
GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains
- BiologyNucleic acids research
- 2010
It is shown how GeMMA clusters can help to predict the impact of experimentally determining a protein domain structure on comparative protein modelling coverage, in the context of structural genomics.
InterPro and InterProScan
- Biology
- 2007
This chapter will describe how to use InterPro and InterProScan for protein sequence classification and comparative proteomics.
References
SHOWING 1-8 OF 8 REFERENCES
TIGRFAMs: a protein family resource for the functional identification of proteins
- BiologyNucleic Acids Res.
- 2001
The term 'equivalog' is introduced to describe members of a set of homologous proteins that are conserved with respect to function since their last common ancestor to support the automated functional identification of proteins by sequence homology.
HMM-based databases in InterPro
- Computer ScienceBriefings Bioinform.
- 2002
This paper reviews the Pfam, TIGRFAMs and SMART databases that use the profile-HMMs provided by the HMMER package to find hidden Markov models used for protein evolution and function detection.
Gene Ontology: tool for the unification of biology
- BiologyNature Genetics
- 2000
The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.
Distinguishing homologous from analogous proteins.
- BiologySystematic zoology
- 1970
This work provides a means by which it is possible to determine whether two groups of related proteins have a common ancestor or are of independent origin, and how many nucleotide positions must differ in the genes encoding the two presumptively homologous proteins.
Identification of genes that are associated with DNA repeats in prokaryotes
- BiologyMolecular microbiology
- 2002
A novel family of repetitive DNA sequences that is present among both domains of the prokaryotes but absent from eukaryotes or viruses is studied, characterized by direct repeats, varying in size from 21 to 37 bp, interspaced by similarly sized non‐repetitive sequences.
Evidence for lateral gene transfer between Archaea and Bacteria from genome sequence of Thermotoga maritima
- BiologyNature
- 1999
Genome analysis reveals numerous pathways involved in degradation of sugars and plant polysaccharides, and 108 genes that have orthologues only in the genomes of other thermophilic Eubacteria and Archaea.
Complete genome sequence of enterohemorrhagic Escherichia coli O157:H7 and genomic comparison with a laboratory strain K-12.
- BiologyDNA research : an international journal for rapid publication of reports on genes and genomes
- 2001
The complete chromosome sequence of an O157:H7 strain isolated from the Sakai outbreak is reported, and the results of genomic comparison with a benign laboratory strain, K-12 MG1655, are identified, which may represent the fundamental backbone of the E. coli chromosome.
Profile hidden Markov models
- Computer ScienceBioinform.
- 1998
Profile HMM methods performed comparably to threading methods in the CASP2 structure prediction exercise and complement standard pairwise comparison methods for large-scale sequence analysis.