BioRelEx 1.0: Biological Relation Extraction Benchmark

  title={BioRelEx 1.0: Biological Relation Extraction Benchmark},
  author={Hrant Khachatrian and Lilit Nersisyan and Karen Hambardzumyan and Tigran Galstyan and Anna Hakobyan and Arsen Arakelyan and A. Rzhetsky and A. G. Galstyan},
Automatic extraction of relations and interactions between biological entities from scientific literature remains an extremely challenging problem in biomedical information extraction and natural language processing in general. One of the reasons for slow progress is the relative scarcity of standardized and publicly available benchmarks. In this paper we introduce BioRelEx, a new dataset of fully annotated sentences from biomedical literature that capture binding interactions between proteins… 

Figures and Tables from this paper

Benchmarking BioRelEx for Entity Tagging and Relation Extraction

The authors' straightforward benchmarking shows that span-based multi-task architectures like DYGIE show 4.9% and 6% absolute improvements in entity tagging and relation extraction respectively over the previous state-of-art and that incorporating domain-specific information like embeddings pre-trained over related domains boosts performance.

A Survey on Event Extraction for Natural Language Understanding: Riding the Biomedical Literature Wave

This paper provides a comprehensive and up-to-date survey on the link between event extraction and natural language understanding, focusing on the biomedical domain and provides a detailed taxonomy for classifying the contributions proposed by the community.

Joint Biomedical Entity and Relation Extraction with Knowledge-Enhanced Collective Inference

KECI takes a collective approach to link mention spans to entities by integrating global relational information into local representations using graph convolutional networks and fuses the initial span graph and the knowledge graph into a more refined graph using an attention mechanism.

Biological Insights Knowledge Graph: an integrated knowledge graph to support drug development

The main design choices in implementation of the BIKG graph are described and different aspects of its life cycle are discussed: from graph construction to exploitation.

BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing

This work introduces BIGBIO a community library of 126+ biomedical NLP datasets, currently covering 12 task categories and 10+ languages, and discusses the process for task schema harmonization, data auditing, contribution guidelines, and outline two illustrative use cases: zero-shot evaluation of biomedical prompts and large-scale, multi-task learning.



RelEx - Relation extraction using dependency parse trees

RelEx, an approach for relation extraction from free text based on natural language preprocessing producing dependency parse trees and applying a small number of simple rules to these trees, is developed.

BioInfer: a corpus for information extraction in the biomedical domain

A corpus targeted at protein, gene, and RNA relationships which serves as a resource for the development of information extraction systems and their components such as parsers and domain analyzers is introduced.

Overview of the protein-protein interaction annotation extraction task of BioCreative II

The BioCreative II PPI task is the first attempt to compare the performance of text-mining tools specific for each of the basic steps of the PPI extraction pipeline, and challenges identified range from problems in full-text format conversion of articles to difficulties in detecting interactor protein pairs and then linking them to their database records.

Comparative experiments on learning information extractors for proteins and their interactions

Identifying Protein-Protein Interaction Using Tree LSTM and Structured Attention

This paper proposes a novel tree recurrent neural network with structured attention architecture for doing PPI that achieves state of the art results (precision, recall, and F1-score) on the AIMed and BioInfer benchmark data sets.

BioCreative V track 4: a shared task for the extraction of causal network information using the Biological Expression Language

The aim of this evaluation method is to help identify the characteristics of the systems which, if combined, would be most useful for achieving the overall goal of automatically constructing causal biological networks from text.

Overview of BioNLP Shared Task 2013

The BioNLP Shared Task 2013 shows advances in the state of the art and demonstrates that extraction methods can be successfully generalized in various aspects.

The BioGRID interaction database: 2019 update

A new dedicated aspect of BioGRID annotates genome-wide CRISPR/Cas9-based screens that report gene–phenotype and gene–gene relationships, and captures chemical interaction data, including chemical–protein interactions for human drug targets drawn from the DrugBank database and manually curated bioactive compounds reported in the literature.

The GENIA Corpus: Annotation Levels and Applications

The GENIA corpus, consisting of 1,999 MEDLINE abstracts, has been continually enriched with various levels of syntactic, semantic and discourse-level annotation, making it suitable for training various types of systems.