BERN2: an advanced neural biomedical named entity recognition and normalization tool
@article{Sung2022BERN2AA, title={BERN2: an advanced neural biomedical named entity recognition and normalization tool}, author={Mujeen Sung and Minbyul Jeong and Yonghwa Choi and Donghyeon Kim and Jinhyuk Lee and Jaewoo Kang}, journal={Bioinformatics}, year={2022}, volume={38}, pages={4837 - 4839} }
Abstract In biomedical natural language processing, named entity recognition (NER) and named entity normalization (NEN) are key tasks that enable the automatic extraction of biomedical entities (e.g. diseases and drugs) from the ever-growing biomedical literature. In this article, we present BERN2 (Advanced Biomedical Entity Recognition and Normalization), a tool that improves the previous neural network-based NER tool by employing a multi-task NER model and neural network-based NEN models to…
8 Citations
Biomedical NER for the Enterprise with Distillated BERN2 and the Kazu Framework
- Computer ScienceArXiv
- 2022
Kazu is a highly extensible, scalable open source framework designed to support BioNLP for the pharmaceutical sector, and is a built around a computationally efficient version of the BERN2 NER model (TinyBERN2), and subsequently wraps several otherBioNLP technologies into one coherent system.
A comprehensive review on knowledge graphs for complex diseases.
- MedicineBriefings in bioinformatics
- 2022
A systematic review to characterize the state-of-the-art of KGs in the area of complex disease research, covering the following topics: knowledge sources, entity extraction methods, relation extraction methods and the application of KG in complex diseases.
SciFact-Open: Towards open-domain scientific claim verification
- Computer ScienceArXiv
- 2022
This work presents SCIFACTOPEN, a new test collection designed to evaluate the performance of scientific claim verification systems on a corpus of 500K research abstracts, and draws upon pooling techniques from information retrieval to collect evidence for scientific claims by pooling and annotating the top predictions of four stateof-the-art science claim verification models.
Enhancing Label Consistency on Document-level Named Entity Recognition
- MedicineArXiv
- 2022
This paper presents the method, ConNER, which enhances the label dependency of modifiers (e.g., adjectives and prepositions) to achieve higher label agreement in NER models, and demonstrates how the approach makes the NER model generate consistent predictions.
Full-text chemical identification with improved generalizability and tagging consistency
- Computer ScienceDatabase J. Biol. Databases Curation
- 2022
This paper identifies two limitations of models in tagging full-text articles: (1) low generalizability to unseen mentions and (2) tagging inconsistency and presents a hybrid model for the normalization task that utilizes the high recall of a neural model while maintaining the high precision of a dictionary model.
The Bioregistry: Unifying the Identification of Biomedical Entities through an Integrative, Open, Community-driven Metaregistry
- Biology, Computer Science
- 2022
The Bioregistry is introduced, an integrative, open, community-driven metaregistry that synthesizes and substantially expands upon 20 existing registries and can be used to support the standardized annotation of data, models, ontologies, and scientific literature, thereby promoting their interoperability and reuse.
Unifying the identification of biomedical entities with the Bioregistry
- Biology, Computer SciencebioRxiv
- 2022
The Bioregistry is introduced, an integrative, open, community-driven metaregistry that synthesizes and substantially expands upon 23 existing registries and can be used to support the standardized annotation of data, models, ontologies, and scientific literature, thereby promoting their interoperability and reuse.
Extractive Search for Analysis of Biomedical Texts
- Computer ScienceSIGIR
- 2022
This work presents a two-stage system that creates custom datasets using a powerful mix of keyword and syntactic matching, and then returns lists of related words, which are used in downstream biomedical work.
References
SHOWING 1-10 OF 53 REFERENCES
Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing
- Computer ScienceACM Trans. Comput. Heal.
- 2022
It is shown that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains over continual pretraining of general-domain language models.
Pretrained Language Models for Biomedical and Clinical Tasks: Understanding and Extending the State-of-the-Art
- Computer Science, BiologyCLINICALNLP
- 2020
A large-scale study is presented across 18 established biomedical and clinical NLP tasks to determine which of several popular open-source biomedical andclinical NLP models work well in different settings, and applies recent advances in pretraining to train new biomedical language models.
Pretrained Language Models for Biomedical and Clinical Tasks: Understanding and Extending the State-of-the-Art
- Computer Science, BiologyCLINICALNLP
- 2020
A large-scale study is presented across 18 established biomedical and clinical NLP tasks to determine which of several popular open-source biomedical andclinical NLP models work well in different settings, and applies recent advances in pretraining to train new biomedical language models.
Vapur: A Search Engine to Find Related Protein - Compound Pairs in COVID-19 Literature
- Computer SciencebioRxiv
- 2020
Vapur is an online COVID-19 search engine specifically designed for finding related protein - chemical pairs, empowered with a biochemically related entities-oriented inverted index in order to group studies relevant to a biomolecule with respect to its related entities.
Vapur: A Search Engine to Find Related Protein - Compound Pairs in COVID-19 Literature
- Computer SciencebioRxiv
- 2020
Vapur is an online COVID-19 search engine specifically designed for finding related protein - chemical pairs, empowered with a biochemically related entities-oriented inverted index in order to group studies relevant to a biomolecule with respect to its related entities.
HunFlair: an easy-to-use tool for state-of-the-art biomedical named entity recognition
- Computer ScienceBioinform.
- 2021
HunFlair is integrated into the widely-used NLP framework Flair, recognizes five biomedical entity types, reaches or overcomes state-of-the-art performance on a wide set of evaluation corpora, and is trained in a cross-corpus setting to avoid corpus-specific bias.
HunFlair: an easy-to-use tool for state-of-the-art biomedical named entity recognition
- Computer ScienceBioinform.
- 2021
HunFlair is integrated into the widely-used NLP framework Flair, recognizes five biomedical entity types, reaches or overcomes state-of-the-art performance on a wide set of evaluation corpora, and is trained in a cross-corpus setting to avoid corpus-specific bias.
Building a PubMed knowledge graph
- Computer ScienceScientific Data
- 2020
A PubMed knowledge graph (PKG) was constructed by extracting bio-entities from 29 million PubMed abstracts, disambiguating author names, integrating funding data through the National Institutes of Health (NIH) ExPORTER, collecting affiliation history and educational background of authors from ORCID ®, and identifying fine-grained affiliation data from MapAffil.
Building a PubMed knowledge graph
- Computer ScienceScientific Data
- 2020
A PubMed knowledge graph (PKG) was constructed by extracting bio-entities from 29 million PubMed abstracts, disambiguating author names, integrating funding data through the National Institutes of Health (NIH) ExPORTER, collecting affiliation history and educational background of authors from ORCID ®, and identifying fine-grained affiliation data from MapAffil.
Biomedical Entity Representations with Synonym Marginalization
- Computer Science, BiologyACL
- 2020
To learn from the incomplete synonyms, this paper uses a model-based candidate selection and maximize the marginal likelihood of the synonyms present in top candidates to avoid the explicit pre-selection of negative samples from more than 400K candidates.