Text mining meets community curation: a newly designed curation platform to improve author experience and participation at WormBase

  title={Text mining meets community curation: a newly designed curation platform to improve author experience and participation at WormBase},
  author={V. Arnaboldi and D. Raciti and K. V. Auken and J. Chan and Hans-Michael M{\"u}ller and P. Sternberg},
  journal={Database: The Journal of Biological Databases and Curation},
Abstract Biological knowledgebases rely on expert biocuration of the research literature to maintain up-to-date collections of data organized in machine-readable form. To enter information into knowledgebases, curators need to follow three steps: (i) identify papers containing relevant data, a process called triaging; (ii) recognize named entities; and (iii) extract and curate data in accordance with the underlying data models. WormBase (WB), the authoritative repository for research data on… Expand
UniProt: the universal protein knowledgebase in 2021
The UniProtKB responded to the COVID-19 pandemic through expert curation of relevant entries that were rapidly made available to the research community through a dedicated portal and a credit-based publication submission interface was developed. Expand
A behind‐the‐scenes tour of the IEDB curation process: an optimized process empirically integrating automation and human curation efforts
Insight is provided into these processes, with particular focus on the dividends they have paid in terms of attaining project milestones, as well as how objective analyses of the authors' processes have identified opportunities for process optimization. Expand
A hybrid approach toward biomedical relation extraction training corpora: combining distant supervision with crowdsourcing
A detailed pipeline for RE crowdsourcing validation is described, creating a new release of the PGR dataset with partial domain expert revision, and assessing the quality of the MTurk platform. Expand


Text mining in the biocuration workflow: applications for literature curation at WormBase, dictyBase and TAIR
It is found that, with organism-specific modifications, Textpresso can be used by dictyBase and TAIR to annotate gene productions to GO's Cellular Component (CC) ontology. Expand
Directly e-mailing authors of newly published papers encourages community curation
An automated method to directly e-mail corresponding authors of new papers, requesting that they list the genes studied and indicate the types of data described in the paper using an online tool is described, with the result that FlyBase curators now spend less time triaging and can devote more effort to the specialized task of detailed data extraction. Expand
Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature
The next generation of the Textpresso information retrieval system, TextPresso Central (TPC), builds on the strengths of the original system by expanding the full text corpus to include the PubMed Central Open Access Subset (PMC OA), as well as the WormBase C. elegans bibliography. Expand
Automatic categorization of diverse experimental information in the bioscience literature
An automatic method for identifying papers containing these curation data types among a large pool of published scientific papers based on the machine learning method Support Vector Machine (SVM) is developed and can be readily incorporated to different workflow at different literature-based databases. Expand
Canto: an online tool for community literature curation
Canto is a web-based tool that provides an intuitive curation interface for both curators and researchers, to support community curation in the fission yeast database, PomBase, and supports curation using OBO ontologies. Expand
Assessment of community-submitted ontology annotations from a novel database-journal partnership
Analysis of a set of ontology annotations generated through collaborations between the Arabidopsis Information Resource and several plant science journals shows that most community annotations were well supported and the ontology terms chosen were at an appropriate level of specificity. Expand
Toward an interactive article: integrating journals and biological databases
A semi-automated pipeline hyperlinks articles published in GENETICS to model organism databases such as WormBase and uses a lexicon built with entities from the database as a first step, which results in interactive articles that are data rich with high accuracy. Expand
Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges
It is argued that text-mining technologies have become essential tools in real-world biomedical research and called for increased collaboration between text- mining researchers and various stakeholders, including researchers, publishers and biocurators. Expand
Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature
Extraction of particular biological facts can be accelerated significantly by ontologies, with Textpresso automatically performing nearly as well as expert curators to identify sentences; in searches for two uniquely named genes and an interaction term, the ontology confers a 3-fold increase of search efficiency. Expand
An effective biomedical document classification scheme in support of biocuration: addressing class imbalance
This work presents an effective classification scheme for automatically identifying papers among a large pool of biomedical publications that contain information relevant to a specific topic, which the curators are interested in annotating, based on a meta-classification framework using cluster-based under-sampling combined with named-entity recognition and statistical feature selection strategies. Expand