Corpus ID: 9839235

Overview of the CHEMDNER patents task

  title={Overview of the CHEMDNER patents task},
  author={Martin Krallinger and O. Rabal and A. Lourenço and M. P{\'e}rez and Gael P{\'e}rez Rodr{\'i}guez and M. Vazquez and F. Leitner and J. Oyarz{\'a}bal and A. Valencia},
A considerable effort has been made to extract biological and chemical entities, as well as their relationships, from the scientific literature, either manually through traditional literature curation or by using information extraction and text mining technologies. Medicinal chemistry patents contain a wealth of information, for instance to uncover potential biomarkers that might play a role in cancer treatment and prognosis. However, current biomedical annotation databases do not cover such… Expand

Figures and Tables from this paper

Recognizing chemicals in patents: a comparative analysis
It is indicated that full patents are considerably harder to analyze than patent abstracts and clearly confirm the common wisdom that using the same text genre (patent vs. scientific) and text type (abstract vs. full text) for training and testing is a pre-requisite for achieving high quality text mining results. Expand
Chemical named entity recognition in patents by domain knowledge and unsupervised feature learning
Two strategies were proposed for feature engineering: domain knowledge features of dictionaries, chemical structural patterns and semantic type information present in the context of the candidate chemical and unsupervised feature learning algorithms to generate word representation features by Brown clustering and a novel binarized Word embedding to enhance the generalizability of the system. Expand
ChEMU 2020: Natural Language Processing Methods Are Effective for Information Extraction From Chemical Patents
The ChEMU 2020 Lab has shown the viability of automated methods to support information extraction of key information in chemical patents and provides a detailed overview of the Ch EMU 2020 corpus and its annotation, showing that inter-annotator agreement is very strong. Expand
Automatic identification of relevant chemical compounds from patents
An automated system that extracts chemical entities from patents and classifies their relevance with high performance is designed, which enables the extension of the Reaxys database by means of automation. Expand
Mining chemical patents with an ensemble of open systems
It is concluded that an ensemble of independently-created open systems is sufficiently diverse to significantly improve performance over any individual system, even when they use a similar approach. Expand
Mining Patents with tmChem , GNormPlus and an Ensemble of Open Systems
The significant amount of medicinal chemistry information contained in patents make them an attractive target for text mining. The CHEMDNER task at BioCreative V focused on information extractionExpand
Overview of the BioCreative VI chemical-protein interaction Track
The BioCreative VI ChemProt track represents the first attempt to promote the development of systems for extracting chemical-protein interactions (CPIs), of relevance for precision medicine as well as for drug discovery and basic biomedical research. Expand
NERChem: adapting NERBio to chemical patents via full-token features and named entity feature with chemical sub-class composition
NERChem is a system that can recognize chemical named entity mentions in chemical patents based on the conditional random fields model (CRF), and achieved an F-score of 87.17% in the Chemical Entity Mention in Patents (CEMP) task and a sensitivity of 98.58% inThe Chemical Passage Detection (CPD) task, ranking alongside the top systems. Expand
A neural network approach to chemical and gene/protein entity recognition in patents
The neural network approach, a bidirectional long short-term memory with a conditional random field layer is employed to recognize biomedical entities from patents and the effect of additional features for the neural network model is explored. Expand
OCMiner for Patents . Extracting Chemical Information from Patent Texts
This paper describes OCMiner, a high-performance semantic text processing system for large document collections of scientific publications, and its performance regarding chemical named entityExpand


Overview of the chemical compound and drug name recognition ( CHEMDNER ) task
There is an increasing need to facilitate automated access to information relevant for chemical compounds and drugs described in text, including scientific articles, patents or health agency reports.Expand
The CHEMDNER corpus of chemicals and drugs and its annotation principles
The CHEMDNER corpus is presented, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. Expand
CHEMDNER: The drugs and chemical names extraction challenge
This task allowed a comparative assessment of the performance of various methodologies using a carefully prepared collection of manually labeled text prepared by specially trained chemists as Gold Standard data, and expected that the tools and resources resulting from this effort will have an impact in future developments of chemical text mining applications. Expand
Annotated Chemical Patent Corpus: A Gold Standard for Text Mining
A large gold standard chemical patent corpus is produced using 200 full patents from the World Intellectual Property Organization, United States Patent and Trademark Office, and European Patent Office and marked chemicals in different subclasses, diseases, targets, and modes of action. Expand
Development and tuning of an original search engine for patent libraries in medicinal chemistry
It is shown that a proper tuning of the system to adapt to the various search tasks clearly increases the effectiveness of theSystem, and it is concluded that different search tasks demand different information retrieval engines' settings in order to yield optimal end-user retrieval. Expand
TREC-CHEM: large scale chemical information retrieval evaluation at TREC
A chemical IR track is organized in TREC (TREC-CHEM) in order to address the challenges in chemical and patent IR and the accomplishments of the first year are described and the discussions for the next year are opened up. Expand
Applications and Challenges of Text Mining with Patents
This paper gives insight into the current research on three text mining tools for patents designed for information professionals, which are used in the industry and could be applied in research as well. Expand