Text Mining for Drugs and Chemical Compounds: Methods, Tools and Applications

@article{Vazquez2011TextMF,
  title={Text Mining for Drugs and Chemical Compounds: Methods, Tools and Applications},
  author={Miguel Vazquez and Martin Krallinger and Florian Leitner and Alfonso Valencia},
  journal={Molecular Informatics},
  year={2011},
  volume={30}
}
Providing prior knowledge about biological properties of chemicals, such as kinetic values, protein targets, or toxic effects, can facilitate many aspects of drug development. Chemical information is rapidly accumulating in all sorts of free text documents like patents, industry reports, or scientific articles, which has motivated the development of specifically tailored text mining applications. Despite the potential gains, chemical text mining still faces significant challenges. One of the… Expand
Overview of the chemical compound and drug name recognition ( CHEMDNER ) task
There is an increasing need to facilitate automated access to information relevant for chemical compounds and drugs described in text, including scientific articles, patents or health agency reports.Expand
A document processing pipeline for annotating chemical entities in scientific documents
TLDR
A machine learning-based solution for automatic recognition of chemical and drug names in scientific documents is presented, which applies a rich feature set, including linguistic, orthographic, morphological, dictionary matching and local context features. Expand
Chemical named entities recognition: a review on approaches and applications
TLDR
This review sketches out dictionary-based, rule-based and machine learning, as well as hybrid chemical named entity recognition approaches with their applied solutions, and an outlook on the pros and cons of these approaches and the types of chemical entities extracted. Expand
PharmaCoNER: Pharmacological Substances, Compounds and proteins Named Entity Recognition track
TLDR
This work organized the first shared task on detecting drug and chemical entities in Spanish medical documents, named PharmaCoNER, and generated annotation guidelines together with a corpus of 1,000 manually annotated clinical case studies to foster the development of new resources for clinical and biomedical text mining systems of Spanish medical data. Expand
CHEMDNER: The drugs and chemical names extraction challenge
TLDR
This task allowed a comparative assessment of the performance of various methodologies using a carefully prepared collection of manually labeled text prepared by specially trained chemists as Gold Standard data, and expected that the tools and resources resulting from this effort will have an impact in future developments of chemical text mining applications. Expand
LimTox: a web tool for applied text mining of adverse event and toxicity associations of compounds, drugs and genes
TLDR
Although the main focus of LimTox is on adverse liver events, it enables also basic searches for other organ level toxicity associations (nephrotoxicity, cardiotoxicity, thyrotoxicity and phospholipidosis), and integrates a range of text mining, named entity recognition and information extraction components. Expand
CheNER: a tool for the identification of chemical entities and their classes in biomedical literature
TLDR
CheNER presents a valid alternative for automated annotation of chemical entities in biomedical documents and may be used to derive new features to train newer methods for tagging chemical entities. Expand
Information Retrieval and Text Mining Technologies for Chemistry.
TLDR
This Review provides a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting information demands of chemical information contained in scientific literature, patents, technical reports, or the web. Expand
The CHEMDNER corpus of chemicals and drugs and its annotation principles
TLDR
The CHEMDNER corpus is presented, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. Expand
Mining Molecular Pharmacological Effects from Biomedical Text: a Case Study for Eliciting Anti‐Obesity/Diabetes Effects of Chemical Compounds
TLDR
INFUSIS, the text mining system presented here, extracts data on chemical compounds from PubMed abstracts and extracts assertions regarding the pharmacological effects of each given compound and scores them by the relevance. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 102 REFERENCES
Pharmspresso: a text mining tool for extraction of pharmacogenomic concepts and relationships from full text
TLDR
The Pharmspresso tool is a text analysis tool that extracts pharmacogenomic concepts from the literature automatically and thus captures the current understanding of gene-drug interactions in a computable form. Expand
Mining connections between chemicals, proteins, and diseases extracted from Medline annotations
TLDR
It is hypothesize, based on protein annotations, that zinc and retinoic acid may play a role in migraine, and the ChemoText repository has promise as a data source for drug discovery. Expand
Analysis of biological processes and diseases using text mining approaches.
TLDR
An overview of disease-centric and gene-centric literature mining methods for linking genes to phenotypic and genotypic aspects and recent efforts for finding biomarkers through text mining and for gene list analysis and prioritization are discussed. Expand
Identification of Chemical Entities in Patent Documents
TLDR
A chemical entity recognizer that uses a machine learning approach based on conditional random fields (CRF) and compare the performance with dictionary-based approaches using several terminological resources is presented. Expand
A dictionary to identify small molecules and drugs in free text
TLDR
A dictionary for the identification of small molecules and drugs in text, combining information from UMLS, MeSH, ChEBI, DrugBank, KEGG, HMDB and ChemIDplus is developed. Expand
PolySearch: a web-based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites
TLDR
The PolySearch web server, a web-based tool that supports comprehensive queries in genomics, proteomics or metabolomics, and exploits a variety of techniques in text mining and information retrieval to identify, highlight and rank informative abstracts, paragraphs or sentences. Expand
Detection of IUPAC and IUPAC-like chemical names
TLDR
This work presents a new machine learning approach based on conditional random fields (CRF) to find mentions of IUPAC and IupAC-like names in scientific text as well as its evaluation and the conversion rate with available name-to-structure tools. Expand
Automated Extraction of Information from the Literature on Chemical-CYP3A4 Interactions
TLDR
A text mining system that extracts information on chemical-CYP3A4 interactions using a simple but effective pattern matching method based on the order of three keywords will be applicable to interactions of chemicals with any functional proteins, such as enzymes and transporters, simply by changing the list of key verbs. Expand
Linking genes to literature: text mining, information extraction, and retrieval applications for biology
TLDR
This review presents a general introduction to the main characteristics and applications of currently available text-mining systems for life sciences in terms of the type of biological information demands being addressed; the level of information granularity of both user queries and results; and the features and methods commonly exploited by these applications. Expand
Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge
TLDR
A common characteristic observed in all three tasks was that the combination of system outputs could yield better results than any single system, including the development of the first text-mining meta-server. Expand
...
1
2
3
4
5
...