Biomedical Interpretable Entity Representations

  title={Biomedical Interpretable Entity Representations},
  author={Diego Garcia-Olano and Yasumasa Onoe and Ioana Baldini and Joydeep Ghosh and Byron C. Wallace and Kush R. Varshney},
Pre-trained language models induce dense entity representations that offer strong performance on entity-centric NLP tasks, but such representations are not immediately interpretable. This can be a barrier to model uptake in important domains such as biomedicine. There has been recent work on general interpretable representation learning (Onoe and Durrett, 2020), but these domain-agnostic representations do not readily transfer to the important domain of biomedicine. In this paper, we create a… 


Interpretable Entity Representations through Large-Scale Typing
This paper presents an approach to creating entity representations that are human readable and achieve high performance on entity-related tasks out of the box, and shows that these embeddings can be post-hoc modified through a small number of rules to incorporate domain knowledge and improve performance.
MedType: Improving Medical Entity Linking with Semantic Type Prediction
This paper presents MedType, a fully modular system that prunes out irrelevant candidate concepts based on the predicted semantic type of an entity mention, and incorporates it into five off-the-shelf toolkits for medical entity linking and demonstrates that it consistently improves entity linking performance across several benchmark datasets.
Biomedical Entity Representations with Synonym Marginalization
To learn from the incomplete synonyms, this paper uses a model-based candidate selection and maximize the marginal likelihood of the synonyms present in top candidates to avoid the explicit pre-selection of negative samples from more than 400K candidates.
Medical Entity Linking using Triplet Network
A robust and portable candidate generation scheme that does not make use of the hand-crafted rules is introduced that outperforms the prior methods by a significant margin.
COMETA: A Corpus for Medical Entity Linking in the Social Media
A new corpus called COMETA is introduced, consisting of 20k English biomedical entity mentions from Reddit expert-annotated with links to SNOMED CT, a widely-used medical knowledge graph, that satisfies a combination of desirable properties that to the best of the knowledge has not been met by any of the existing resources in the field.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
This article introduces BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora that largely outperforms BERT and previous state-of-the-art models in a variety of biomedical text mining tasks when pre- trained on biomedical Corpora.
EntEval: A Holistic Evaluation Benchmark for Entity Representations
This work proposes EntEval: a test suite of diverse tasks that require nontrivial understanding of entities including entity typing, entity similarity, entity relation prediction, and entity disambiguation, and develops training techniques for learning better entity representations by using natural hyperlink annotations in Wikipedia.
MedMentions: A Large Biomedical Corpus Annotated with UMLS Concepts
To encourage research in Biomedical Named Entity Recognition and Linking, data splits for training and testing are included in the release, and a baseline model and its metrics for entity linking are also described.
Ultra-Fine Entity Typing
A model that can predict ultra-fine types is presented, and is trained using a multitask objective that pools the authors' new head-word supervision with prior supervision from entity linking, and achieves state of the art performance on an existing fine-grained entity typing benchmark, and sets baselines for newly-introduced datasets.
Language Models as Knowledge Bases?
An in-depth analysis of the relational knowledge already present (without fine-tuning) in a wide range of state-of-the-art pretrained language models finds that BERT contains relational knowledge competitive with traditional NLP methods that have some access to oracle knowledge.