• Publications
  • Influence
Universal Dependencies v1: A Multilingual Treebank Collection
This paper describes v1 of the universal guidelines, the underlying design principles, and the currently available treebanks for 33 languages, as well as highlighting the needs for sound comparative evaluation and cross-lingual learning experiments.
brat: a Web-based Tool for NLP-Assisted Text Annotation
The brat rapid annotation tool (BRAT) is introduced, an intuitive web-based tool for text annotation supported by Natural Language Processing (NLP) technology and an evaluation of annotation assisted by semantic class disambiguation on a multicategory entity mention annotation task, showing a 15% decrease in total annotation time.
Overview of BioNLP’09 Shared Task on Event Extraction
The design and implementation of the BioNLP'09 Shared Task is presented, indicating that state-of-the-art performance is approaching a practically applicable level and revealing some remaining challenges.
BioInfer: a corpus for information extraction in the biomedical domain
A corpus targeted at protein, gene, and RNA relationships which serves as a resource for the development of information extraction systems and their components such as parsers and domain analyzers is introduced.
All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning
A detailed evaluation of the effects of training and testing on different resources is performed, providing insight into the challenges involved in applying a system beyond the data it was trained on, and several pitfalls that can make evaluations of PPI-extraction systems incomparable, or even invalid are identified.
Distributional Semantics Resources for Biomedical Text Processing
This study introduces the first set of such language resources created from analysis of the entire available biomedical literature, including a dataset of all 1to 5-grams and their probabilities in these texts and new models of word semantics.
Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection
Universal Dependencies is an open community effort to create cross-linguistically consistent treebank annotation for many languages within a dependency-based lexicalist framework. The annotation
The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets
Changes to the text-mining system, a new scoring-mode for physical interactions, as well as extensive user interface features for customizing, extending and sharing protein networks are described.
How to Train good Word Embeddings for Biomedical NLP
It is found that bigger corpora do not necessarily produce better biomedical domain word embeddings and one that creates contradictory results between intrinsic and extrinsic evaluations is observed.
Comparative analysis of five protein-protein interaction corpora
This first comparative evaluation of the diverse PPI corpora is presented, performing quantitative evaluation using two separate information extraction methods as well as detailed statistical and qualitative analyses of their properties.