• Publications
  • Influence
InterPro in 2017—beyond protein family and domain annotations
TLDR
We report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation. Expand
  • 940
  • 63
  • PDF
A large-scale evaluation of computational protein function prediction
Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. IfExpand
  • 641
  • 39
  • PDF
InterPro in 2019: improving coverage, classification and access to protein sequence annotations
TLDR
We report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new website. Expand
  • 519
  • 35
  • PDF
BayGenomics: a resource of insertional mutations in mouse embryonic stem cells
TLDR
The BayGenomics gene-trap resource (http://baygenomics.ucsf.edu) provides researchers with access to thousands of mouse embryonic stem (ES) cell lines harboring characterized insertional mutations in both known and novel genes. Expand
  • 250
  • 21
  • PDF
Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies
TLDR
We have investigated the misannotation levels for molecular function in four public protein sequence databases (UniProtKB, GenBank NR, UniProtKB/TrEMBL, and KEGG) for a model set of 37 enzyme families for which extensive experimental information is available. Expand
  • 553
  • 18
  • PDF
An expanded evaluation of protein function prediction methods shows an improvement in accuracy
TLDR
We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology. Expand
  • 244
  • 15
  • PDF
Divergent evolution of enzymatic function: mechanistically diverse superfamilies and functionally distinct suprafamilies.
The protein sequence and structure databases are now sufficiently representative that strategies nature uses to evolve new catalytic functions can be identified. Groups of divergently related enzymesExpand
  • 452
  • 13
Using Sequence Similarity Networks for Visualization of Relationships Across Diverse Protein Superfamilies
The dramatic increase in heterogeneous types of biological data—in particular, the abundance of new protein sequences—requires fast and user-friendly methods for organizing this information in a wayExpand
  • 276
  • 13
  • PDF
The enolase superfamily: a general strategy for enzyme-catalyzed abstraction of the alpha-protons of carboxylic acids.
We have discovered a superfamily of enzymes related by their ability to catalyze the abstraction of the alpha-proton of a carboxylic acid to form an enolic intermediate. Although each reactionExpand
  • 280
  • 12
  • PDF
The Structure–Function Linkage Database
TLDR
The Structure–Function Linkage Database (SFLD, http://sfld.rbvi.ucsf.edu/) is a manually curated classification resource describing structure–function relationships for functionally diverse enzyme superfamilies. Expand
  • 142
  • 12