The SUPERFAMILY 1.75 database in 2014: a doubling of data


We present updates to the SUPERFAMILY 1.75 ( online resource and protein sequence collection. The hidden Markov model library that provides sequence homology to SCOP structural domains remains unchanged at version 1.75. In the last 4 years SUPERFAMILY has more than doubled its holding of curated complete proteomes over all cellular life, from 1400 proteomes reported previously in 2010 up to 3258 at present. Outside of the main sequence collection, SUPERFAMILY continues to provide domain annotation for sequences provided by other resources such as: UniProt, Ensembl, PDB, much of JGI Phytozome and selected subcollections of NCBI RefSeq. Despite this growth in data volume, SUPERFAMILY now provides users with an expanded and daily updated phylogenetic tree of life (sTOL). This tree is built with genomic-scale domain annotation data as before, but constantly updated when new species are introduced to the sequence library. Our Gene Ontology and other functional and phenotypic annotations previously reported have stood up to critical assessment by the function prediction community. We have now introduced these data in an integrated manner online at the level of an individual sequence, and--in the case of whole genomes--with enrichment analysis against a taxonomically defined background.

DOI: 10.1093/nar/gku1041

Extracted Key Phrases

6 Figures and Tables

Citations per Year

59 Citations

Semantic Scholar estimates that this publication has 59 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Oates2015TheS1, title={The SUPERFAMILY 1.75 database in 2014: a doubling of data}, author={Matt E. Oates and Jonathan Stahlhacke and Dimitrios V. Vavoulis and Ben Smithers and Owen J. L. Rackham and Adam J. Sardar and Jan Zaucha and Natalie Thurlby and Hai Fang and Julian Gough}, booktitle={Nucleic Acids Research}, year={2015} }