• Corpus ID: 30300780

Annotating chemicals , diseases and their interactions in biomedical literature

@inproceedings{Li2015AnnotatingC,
  title={Annotating chemicals , diseases and their interactions in biomedical literature},
  author={Jiao Li and Yueping Sun and Robin J. Johnson and Daniela Sciaky and Chih-Hsuan Wei and Robert Leaman and Allan Peter Davis and Carolyn J. Mattingly and Thomas C. Wiegers and Zhiyong Lu},
  year={2015}
}
Community-run formal evaluations and manually annotated text corpora are critically important for advancing biomedical text mining research. Recently in BioCreative V, a new challenge was organized for the tasks of disease named entity recognition (DNER) and chemical-induced disease (CID) relation extraction. Given the nature of both tasks, a test collection is required to contain both disease/chemical annotations and relation annotations in the same set of articles. Despite previous efforts in… 

Figures and Tables from this paper

BioCreative V CDR task corpus: a resource for chemical disease relation extraction
TLDR
The BC5CDR corpus was successfully used for the BioCreative V challenge tasks and should serve as a valuable resource for the text-mining research community.
Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task
TLDR
This task was found to be successful in engaging the text-mining research community, producing a large annotated corpus and improving the results of automatic disease recognition and CDR extraction.
Semantic annotation in biomedicine: the current landscape
TLDR
This paper focuses on annotation of biomedical entity mentions with concepts from relevant biomedical knowledge bases such as UMLS, focusing particularly on general purpose annotators, that is, semantic annotation tools that can be customized to work with texts from any area of biomedicine.
Efficient chemical-disease identification and relationship extraction using Wikipedia to improve recall
TLDR
The authors' text mining solution, LeadMine, was extended to identify diseases and chemical-induced disease relationships (CIDs) and was able to apply the same system to the entirety of MEDLINE allowing us to extract a collection of over 250 000 distinct CIDs.
Chemical-induced disease relation extraction with various linguistic features
TLDR
A machine learning based system that utilized simple yet effective linguistic features to extract relations with maximum entropy models and the hypernym relations between entity concepts derived from the Medical Subject Headings (MeSH)-controlled vocabulary to obtain more accurate classification models and better extraction performance.
Extracting structured chemical-induced disease relations from free text via crowdsourcing
Relationships between chemicals and diseases are important for biomedical research. Assembling databases of these relations is costly and often relies on expert curation. Here, we describe a
A crowdsourcing workflow for extracting chemical-induced disease relations from free text
TLDR
A crowdsourcing workflow for extracting chemical-induced disease relations from free text as part of the BioCreative V Chemical Disease Relation challenge, which revealed that performance can still be improved.
BioCreative V CDR Task : Identifying Chemical-induced Disease Relations in Biomedical Text
This paper describes the system developed by the UTH-CCB team from the University of Texas Health Science Center at Houston (UTHealth), for the 2015 BioCreative V shared tasks of Track 3 on
A knowledge-poor approach to chemical-disease relation extraction
TLDR
A knowledge-poor approach to the task of extracting Chemical-Disease Relations from PubMed abstracts based on machine learning techniques integrated with a limited number of domain-specific knowledge resources and using freely available tools for preprocessing data.
A corpus for plant-chemical relationships in the biomedical domain
TLDR
A corpus for plant and chemical entities and for the relationships between them is constructed and a rule-based model to automatically extract such plant–chemical relationships is developed.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 25 REFERENCES
The CHEMDNER corpus of chemicals and drugs and its annotation principles
TLDR
The CHEMDNER corpus is presented, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task.
The EU-ADR corpus: Annotated drugs, diseases, targets, and their relationships
TLDR
This paper describes an approach where a named-entity recognition system produces a first annotation and annotators revise this annotation using a web-based interface, showing that the inter-annotator agreement is much better than the agreement with the system provided annotations.
NCBI disease corpus: A resource for disease name recognition and concept normalization
TLDR
The results show that the NCBI disease corpus has the potential to significantly improve the state-of-the-art in disease name recognition and normalization research, by providing a high-quality gold standard thus enabling the development of machine-learning based approaches for such tasks.
An improved corpus of disease mentions in PubMed citations
TLDR
A large-scale disease corpus consisting of 6900 disease mentions in 793 PubMed citations, derived from an earlier corpus is created, which contains rich annotations and makes this disease name corpus a valuable resource for mining disease-related information from biomedical text.
The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text
TLDR
The results of the ACT task of BioCreative III indicate that classification of large unbalanced article collections reflecting the real class imbalance is still challenging, and text-mining tools that report ranked lists of relevant articles for manual selection can potentially reduce the time needed to identify half of the relevant articles to less than 1/4 of the time when compared to unranked results.
The curation paradigm and application tool used for manual curation of the scientific literature at the Comparative Toxicogenomics Database
TLDR
This approach to manual curation that uses a powerful and efficient paradigm involving mnemonic codes is incorporated into a web-based curation tool to further increase efficiency and productivity, implement quality control in real-time and accommodate biocurators working remotely.
Semi-automatic semantic annotation of PubMed queries: A study on quality, efficiency, satisfaction
TLDR
This study shows that automatic pre-annotations are found helpful by most annotators, and suggests using an automatic tool to assist large-scale manual annotation projects to speed-up the annotation time and improve annotation consistency while maintaining high quality of the final annotations.
The DDI corpus: An annotated corpus with pharmacological substances and drug-drug interactions
TLDR
A manually annotated corpus consisting of 792 texts selected from the DrugBank database and other 233 Medline abstracts, annotated with a total of 18,502 pharmacological substances and 5028 DDIs, including both PK as well as PD interactions, shows that the corpus has enough quality to be used for training and testing NLP techniques applied to the field of Pharmacovigilance.
An analysis on the entity annotations in biological corpora
TLDR
An overview of 36 corpora is presented and an analysis on the semantic annotations they contain and results show that while some semantic entities, such as genes, proteins and chemicals are consistently annotated in many collections, corpora available for diseases, variations and mutations are still few, in spite of their importance in the biological domain.
A CTD–Pfizer collaboration: manual curation of 88 000 scientific articles text mined for drug–disease and drug–phenotype interactions
TLDR
A collaboration between safety researchers at Pfizer and the research team at the Comparative Toxicogenomics Database (CTD) to text mine and manually review a collection of 88 629 articles relating over 1 200 pharmaceutical drugs to their potential involvement in cardiovascular, neurological, renal and hepatic toxicity.
...
1
2
3
...