BERN2: an advanced neural biomedical named entity recognition and normalization tool

@article{Sung2022BERN2AA,
  title={BERN2: an advanced neural biomedical named entity recognition and normalization tool},
  author={Mujeen Sung and Minbyul Jeong and Yonghwa Choi and Donghyeon Kim and Jinhyuk Lee and Jaewoo Kang},
  journal={Bioinformatics},
  year={2022},
  volume={38},
  pages={4837 - 4839}
}
Abstract   In biomedical natural language processing, named entity recognition (NER) and named entity normalization (NEN) are key tasks that enable the automatic extraction of biomedical entities (e.g. diseases and drugs) from the ever-growing biomedical literature. In this article, we present BERN2 (Advanced Biomedical Entity Recognition and Normalization), a tool that improves the previous neural network-based NER tool by employing a multi-task NER model and neural network-based NEN models to… 

Figures and Tables from this paper

Biomedical NER for the Enterprise with Distillated BERN2 and the Kazu Framework

Kazu is a highly extensible, scalable open source framework designed to support BioNLP for the pharmaceutical sector, and is a built around a computationally efficient version of the BERN2 NER model (TinyBERN2), and subsequently wraps several otherBioNLP technologies into one coherent system.

A comprehensive review on knowledge graphs for complex diseases.

A systematic review to characterize the state-of-the-art of KGs in the area of complex disease research, covering the following topics: knowledge sources, entity extraction methods, relation extraction methods and the application of KG in complex diseases.

SciFact-Open: Towards open-domain scientific claim verification

This work presents SCIFACTOPEN, a new test collection designed to evaluate the performance of scientific claim verification systems on a corpus of 500K research abstracts, and draws upon pooling techniques from information retrieval to collect evidence for scientific claims by pooling and annotating the top predictions of four stateof-the-art science claim verification models.

Enhancing Label Consistency on Document-level Named Entity Recognition

This paper presents the method, ConNER, which enhances the label dependency of modifiers (e.g., adjectives and prepositions) to achieve higher label agreement in NER models, and demonstrates how the approach makes the NER model generate consistent predictions.

Full-text chemical identification with improved generalizability and tagging consistency

This paper identifies two limitations of models in tagging full-text articles: (1) low generalizability to unseen mentions and (2) tagging inconsistency and presents a hybrid model for the normalization task that utilizes the high recall of a neural model while maintaining the high precision of a dictionary model.

The Bioregistry: Unifying the Identification of Biomedical Entities through an Integrative, Open, Community-driven Metaregistry

The Bioregistry is introduced, an integrative, open, community-driven metaregistry that synthesizes and substantially expands upon 20 existing registries and can be used to support the standardized annotation of data, models, ontologies, and scientific literature, thereby promoting their interoperability and reuse.

Unifying the identification of biomedical entities with the Bioregistry

The Bioregistry is introduced, an integrative, open, community-driven metaregistry that synthesizes and substantially expands upon 23 existing registries and can be used to support the standardized annotation of data, models, ontologies, and scientific literature, thereby promoting their interoperability and reuse.

Extractive Search for Analysis of Biomedical Texts

This work presents a two-stage system that creates custom datasets using a powerful mix of keyword and syntactic matching, and then returns lists of related words, which are used in downstream biomedical work.

References

SHOWING 1-10 OF 53 REFERENCES

Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

It is shown that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains over continual pretraining of general-domain language models.

Pretrained Language Models for Biomedical and Clinical Tasks: Understanding and Extending the State-of-the-Art

A large-scale study is presented across 18 established biomedical and clinical NLP tasks to determine which of several popular open-source biomedical andclinical NLP models work well in different settings, and applies recent advances in pretraining to train new biomedical language models.

Pretrained Language Models for Biomedical and Clinical Tasks: Understanding and Extending the State-of-the-Art

A large-scale study is presented across 18 established biomedical and clinical NLP tasks to determine which of several popular open-source biomedical andclinical NLP models work well in different settings, and applies recent advances in pretraining to train new biomedical language models.

Vapur: A Search Engine to Find Related Protein - Compound Pairs in COVID-19 Literature

Vapur is an online COVID-19 search engine specifically designed for finding related protein - chemical pairs, empowered with a biochemically related entities-oriented inverted index in order to group studies relevant to a biomolecule with respect to its related entities.

Vapur: A Search Engine to Find Related Protein - Compound Pairs in COVID-19 Literature

Vapur is an online COVID-19 search engine specifically designed for finding related protein - chemical pairs, empowered with a biochemically related entities-oriented inverted index in order to group studies relevant to a biomolecule with respect to its related entities.

HunFlair: an easy-to-use tool for state-of-the-art biomedical named entity recognition

HunFlair is integrated into the widely-used NLP framework Flair, recognizes five biomedical entity types, reaches or overcomes state-of-the-art performance on a wide set of evaluation corpora, and is trained in a cross-corpus setting to avoid corpus-specific bias.

HunFlair: an easy-to-use tool for state-of-the-art biomedical named entity recognition

HunFlair is integrated into the widely-used NLP framework Flair, recognizes five biomedical entity types, reaches or overcomes state-of-the-art performance on a wide set of evaluation corpora, and is trained in a cross-corpus setting to avoid corpus-specific bias.

Building a PubMed knowledge graph

A PubMed knowledge graph (PKG) was constructed by extracting bio-entities from 29 million PubMed abstracts, disambiguating author names, integrating funding data through the National Institutes of Health (NIH) ExPORTER, collecting affiliation history and educational background of authors from ORCID ®, and identifying fine-grained affiliation data from MapAffil.

Building a PubMed knowledge graph

A PubMed knowledge graph (PKG) was constructed by extracting bio-entities from 29 million PubMed abstracts, disambiguating author names, integrating funding data through the National Institutes of Health (NIH) ExPORTER, collecting affiliation history and educational background of authors from ORCID ®, and identifying fine-grained affiliation data from MapAffil.

Biomedical Entity Representations with Synonym Marginalization

To learn from the incomplete synonyms, this paper uses a model-based candidate selection and maximize the marginal likelihood of the synonyms present in top candidates to avoid the explicit pre-selection of negative samples from more than 400K candidates.
...