CHEMDNER: The drugs and chemical names extraction challenge

@article{Krallinger2015CHEMDNERTD,
  title={CHEMDNER: The drugs and chemical names extraction challenge},
  author={Martin Krallinger and F. Leitner and O. Rabal and M. Vazquez and J. Oyarz{\'a}bal and A. Valencia},
  journal={Journal of Cheminformatics},
  year={2015},
  volume={7},
  pages={S1 - S1}
}
Natural language processing (NLP) and text mining technologies for the chemical domain (ChemNLP or chemical text mining) are key to improve the access and integration of information from unstructured data such as patents or the scientific literature. Therefore, the BioCreative organizers posed the CHEMDNER (chemical compound and drug name recognition) community challenge, which promoted the development of novel, competitive and accessible chemical text mining systems. This task allowed a… Expand
The CHEMDNER corpus of chemicals and drugs and its annotation principles
TLDR
The CHEMDNER corpus is presented, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. Expand
ChEMU 2020: Natural Language Processing Methods Are Effective for Information Extraction From Chemical Patents
TLDR
The ChEMU 2020 Lab has shown the viability of automated methods to support information extraction of key information in chemical patents and provides a detailed overview of the Ch EMU 2020 corpus and its annotation, showing that inter-annotator agreement is very strong. Expand
Overview of the CHEMDNER patents task
A considerable effort has been made to extract biological and chemical entities, as well as their relationships, from the scientific literature, either manually through traditional literatureExpand
A comparison of conditional random fields and structured support vector machines for chemical entity recognition in biomedical literature
TLDR
A machine learning-based system based on conditional random fields (CRF) and structured support vector machines (SSVM) for the CEM task for this data set showed better performance than the systems using only one type of word representation (WR) features. Expand
tmChem: a high performance approach for chemical named entity recognition and normalization
Chemical compounds and drugs are an important class of entities in biomedical research with great potential in a wide range of applications, including clinical medicine. Locating chemical namedExpand
A CRF-based system for recognizing chemical entity mentions (CEMs) in biomedical literature
TLDR
Though the current system has much room for improvement, this system is valuable in showing that the performance in term of balanced F-measure can be improved largely by utilizing large amounts of relatively inexpensive un-annotated PubMed abstracts and optimizing the cost parameter in CRF model. Expand
Adapting ChER for the recognition of chemical mentions in patents
ChER (Chemical Entity Recogniser) is a pipeline of natural language processing tools optimised for the recognition of chemical names in scientific abstracts. It formed the basis of our submissions toExpand
Patent mining : combining dictionary-based and machine-learning approaches
Exploration of the chemical patent space is essential for early-stage medicinal chemistry activities. The BioCreative CHEMDNER-patents task focuses on the recognition of chemical compounds inExpand
Information Retrieval and Text Mining Technologies for Chemistry.
TLDR
This Review provides a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting information demands of chemical information contained in scientific literature, patents, technical reports, or the web. Expand
NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature
TLDR
The NLM-Chem corpus consists of 150 full-text articles, doubly annotated by ten expert NLM indexers, with ~5000 unique chemical name annotations, mapped to ~2000 MeSH identifiers, and a substantially improved chemical entity tagger is described. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 58 REFERENCES
The CHEMDNER corpus of chemicals and drugs and its annotation principles
TLDR
The CHEMDNER corpus is presented, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. Expand
A document processing pipeline for annotating chemical entities in scientific documents
TLDR
A machine learning-based solution for automatic recognition of chemical and drug names in scientific documents is presented, which applies a rich feature set, including linguistic, orthographic, morphological, dictionary matching and local context features. Expand
A comparison of conditional random fields and structured support vector machines for chemical entity recognition in biomedical literature
TLDR
A machine learning-based system based on conditional random fields (CRF) and structured support vector machines (SSVM) for the CEM task for this data set showed better performance than the systems using only one type of word representation (WR) features. Expand
tmChem: a high performance approach for chemical named entity recognition and normalization
Chemical compounds and drugs are an important class of entities in biomedical research with great potential in a wide range of applications, including clinical medicine. Locating chemical namedExpand
A CRF-based system for recognizing chemical entity mentions (CEMs) in biomedical literature
TLDR
Though the current system has much room for improvement, this system is valuable in showing that the performance in term of balanced F-measure can be improved largely by utilizing large amounts of relatively inexpensive un-annotated PubMed abstracts and optimizing the cost parameter in CRF model. Expand
Text Mining for Drugs and Chemical Compounds: Methods, Tools and Applications
TLDR
A good portion of this review is devoted to chemical text mining, and presents the basic concepts and principles underlying the main strategies, and introduces a number of published applications that can be used to build pipelines in topics like drug side effects, toxicity, and protein‐disease‐compound network analysis. Expand
Recognition of chemical entities: combining dictionary-based and grammar-based approaches
TLDR
An ensemble system that combines dictionary-based and grammar-based approaches for chemical named entity recognition, outperforming any of the individual systems that were considered, is developed. Expand
CheNER: a tool for the identification of chemical entities and their classes in biomedical literature
TLDR
CheNER presents a valid alternative for automated annotation of chemical entities in biomedical documents and may be used to derive new features to train newer methods for tagging chemical entities. Expand
Optimising chemical named entity recognition with pre-processing analytics, knowledge-rich features and heuristics
TLDR
The details of a chemical entity recognition methodology that has demonstrated performance at a competitive, if not superior, level as that of state-of-the-art methods are presented. Expand
Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations
TLDR
A semi-supervised learning method that efficiently exploits unlabeled data in order to incorporate domain knowledge into a named entity recognition model and to leverage system performance in both chemical and biomedical NER. Expand
...
1
2
3
4
5
...