• Corpus ID: 245124471

An Empirical Study on Relation Extraction in the Biomedical Domain

  title={An Empirical Study on Relation Extraction in the Biomedical Domain},
  author={Yongkang Li},
  • Yongkang Li
  • Published 11 December 2021
  • Computer Science
  • ArXiv
Relation extraction is a fundamental problem in natural language processing. Most existing models are defined for relation extraction in the general domain. However, their performance on specific domains (e.g., biomedicine) is yet unclear. To fill this gap, this paper carries out an empirical study on relation extraction in biomedical research articles. Specifically, we consider both sentence-level and documentlevel relation extraction, and run a few state-ofthe-art methods on several benchmark… 

Tables from this paper


BioCreative V CDR task corpus: a resource for chemical disease relation extraction
The BC5CDR corpus was successfully used for the BioCreative V challenge tasks and should serve as a valuable resource for the text-mining research community.
RENET: A Deep Learning Approach for Extracting Gene-Disease Associations from Literature
A deep learning approach is designed and implemented, named RENET, which considers the correlation between the sentences in an article to extract gene-disease associations and has significantly improved the precision and recall rate.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
This article introduces BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora that largely outperforms BERT and previous state-of-the-art models in a variety of biomedical text mining tasks when pre- trained on biomedical Corpora.
DocRED: A Large-Scale Document-Level Relation Extraction Dataset
Empirical results show that DocRED is challenging for existing RE methods, which indicates that document-level RE remains an open problem and requires further efforts.
Matching the Blanks: Distributional Similarity for Relation Learning
This paper builds on extensions of Harris’ distributional hypothesis to relations, as well as recent advances in learning text representations (specifically, BERT), to build task agnostic relation representations solely from entity-linked text.
Distant supervision for relation extraction without labeled data
This work investigates an alternative paradigm that does not require labeled corpora, avoiding the domain dependence of ACE-style algorithms, and allowing the use of corpora of any size.
Document-level Relation Extraction as Semantic Segmentation
A Document U-shaped Network for document-level relation extraction is proposed, which leverages an encoder module to capture the context information of entities and a U- shaped segmentation module over the image-style feature map to capture global interdependency among triples.
SciBERT: A Pretrained Language Model for Scientific Text
SciBERT leverages unsupervised pretraining on a large multi-domain corpus of scientific publications to improve performance on downstream scientific NLP tasks and demonstrates statistically significant improvements over BERT.
Position-aware Attention and Supervised Data Improve Slot Filling
An effective new model is proposed, which combines an LSTM sequence model with a form of entity position-aware attention that is better suited to relation extraction that builds TACRED, a large supervised relation extraction dataset obtained via crowdsourcing and targeted towards TAC KBP relations.