• Corpus ID: 243947622

Improving Tagging Consistency and Entity Coverage for Chemical Identification in Full-text Articles

@article{Kim2021ImprovingTC,
  title={Improving Tagging Consistency and Entity Coverage for Chemical Identification in Full-text Articles},
  author={Hyunjae Kim and Mujeen Sung and Wonjin Yoon and Sungjoon Park and Jaewoo Kang},
  journal={ArXiv},
  year={2021},
  volume={abs/2111.10584}
}
This paper is a technical report on our system submitted to the chemical identification task of the BioCreative VII Track 2 challenge. The main feature of this challenge is that the data consists of full-text articles, while current datasets usually consist of only titles and abstracts. To effectively address the problem, we aim to improve tagging consistency and entity coverage using various methods such as majority voting within the same articles for named entity recognition (NER) and a… 

Figures and Tables from this paper

The overview of the NLM-Chem BioCreative VII track Full-text Chemical Identification and Indexing in PubMed articles
TLDR
This community challenge demonstrated 1) the current substantial achievements in deep learning technologies can be utilized to further improve automated prediction accuracy, and 2) the Chemical Indexing task is substantially more challenging.

References

SHOWING 1-10 OF 17 REFERENCES
The overview of the NLM-Chem BioCreative VII track Full-text Chemical Identification and Indexing in PubMed articles
TLDR
This community challenge demonstrated 1) the current substantial achievements in deep learning technologies can be utilized to further improve automated prediction accuracy, and 2) the Chemical Indexing task is substantially more challenging.
The CHEMDNER corpus of chemicals and drugs and its annotation principles
TLDR
The CHEMDNER corpus is presented, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task.
NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature
TLDR
The NLM-Chem corpus consists of 150 full-text articles, doubly annotated by ten expert NLM indexers, with ~5000 unique chemical name annotations, mapped to ~2000 MeSH identifiers, and a substantially improved chemical entity tagger is described.
The chemical corpus of the NLM-Chem BioCreative VII track Full-text Chemical Identification and Indexing in PubMed articles
TLDR
Using the NLM-Chem BioCreative VII corpus, a high-quality manually annotated corpus of 200 full-text PubMed central articles, improvements in the chemical entity recognition algorithms are demonstrated.
Biomedical Entity Representations with Synonym Marginalization
TLDR
To learn from the incomplete synonyms, this paper uses a model-based candidate selection and maximize the marginal likelihood of the synonyms present in top candidates to avoid the explicit pre-selection of negative samples from more than 400K candidates.
BioCreative V CDR task corpus: a resource for chemical disease relation extraction
TLDR
The BC5CDR corpus was successfully used for the BioCreative V challenge tasks and should serve as a valuable resource for the text-mining research community.
An attention‐based BiLSTM‐CRF approach to document‐level chemical named entity recognition
TLDR
A neural network approach, i.e. attention‐based bidirectional Long Short‐Term Memory with a conditional random field layer (Att‐BiLSTM‐CRF), to document‐level chemical NER that achieves better performances with little feature engineering than other state‐of‐the‐art methods.
Leveraging Document-Level Label Consistency for Named Entity Recognition
TLDR
This work introduces a novel two-stage label refinement approach to handle documentlevel label consistency, where a key-value memory network is first used to record draft labels predicted by the base model, and then a multi-channel Transformer makes refinements on these draft predictions based on the explicit co-occurrence relationship derived from the memory network.
Hierarchical Contextualized Representation for Named Entity Recognition
TLDR
This paper proposes a model augmented with hierarchical contextualized representation: sentence- level representation and document-level representation that takes different contributions of words in a single sentence into consideration to enhance the sentence representation learned from an independent BiLSTM via label embedding attention mechanism.
How Do Your Biomedical Named Entity Models Generalize to Novel Entities?
TLDR
It is found that although BioNER models achieve state-of-the-art performance on BioNER benchmarks based on overall performance, they have limitations in identifying synonyms and new biomedical concepts such as COVID-19.
...
1
2
...