• Publications
  • Influence
InterPro in 2017—beyond protein family and domain annotations
Recent developments with InterPro are reported, including the addition of two new databases, and the functionality to include residue-level annotation and prediction of intrinsic disorder, which enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences.
A large-scale evaluation of computational protein function prediction
Today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets, and there is considerable need for improvement of currently available tools.
InterPro in 2019: improving coverage, classification and access to protein sequence annotations
Recent developments with InterPro (version 70.0) and its associated software are reported, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website.
BayGenomics: a resource of insertional mutations in mouse embryonic stem cells
The BayGenomics gene-trap resource (http://baygenomics.ucsf.edu) provides researchers with access to thousands of mouse embryonic stem (ES) cell lines harboring characterized insertional mutations in
Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies
The results suggest that misannotation in enzyme superfamilies containing multiple families that catalyze different reactions is a larger problem than has been recognized and strategies are suggested for addressing some of the systematic problems contributing to these high levels of misannation.
Divergent evolution of enzymatic function: mechanistically diverse superfamilies and functionally distinct suprafamilies.
The protein sequence and structure databases are now sufficiently representative that strategies nature uses to evolve new catalytic functions can be identified and may provide the basis for discovering the functions of proteins and enzymes in new genomes as well as provide guidance for in vitro evolution/engineering of new enzymes.
An expanded evaluation of protein function prediction methods shows an improvement in accuracy
The second critical assessment of functional annotation (CAFA) conducted, a timed challenge to assess computational methods that automatically assign protein function, revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies.
The enolase superfamily: a general strategy for enzyme-catalyzed abstraction of the alpha-protons of carboxylic acids.
A superfamily of enzymes related by their ability to catalyze the abstraction of the alpha-proton of a carboxylic acid to form an enolic intermediate is discovered, and the established and deduced structure-function relationships in the superfamily allow the prediction that other apparent members of the family for which no catalytic functions have yet been assigned will also perform chemistry involving abstraction of their alpha-protons.
Using Sequence Similarity Networks for Visualization of Relationships Across Diverse Protein Superfamilies
It is shown that overlaying networks with orthogonal information is a powerful approach for observing functional themes and revealing outliers in protein superfamilies, and sequence similarity networks show great potential for generating testable hypotheses about protein structure-function relationships.
The Structure–Function Linkage Database
The SFLD subdivides superfamily members into subgroups using sequence information, and lastly into families, sets of enzymes known to catalyze the same reaction using the same mechanistic strategy, providing a new and intuitively powerful way to visualize functional trends mapped to the context of sequence similarity.