Information Retrieval and Text Mining Technologies for Chemistry.

@article{Krallinger2017InformationRA,
  title={Information Retrieval and Text Mining Technologies for Chemistry.},
  author={Martin Krallinger and Obdulia Rabal and An{\'a}lia Lourenço and Julen Oyarz{\'a}bal and Alfonso Valencia},
  journal={Chemical reviews},
  year={2017},
  volume={117 12},
  pages={
          7673-7761
        }
}
Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly… Expand
ChEMU 2020: Natural Language Processing Methods Are Effective for Information Extraction From Chemical Patents
TLDR
The ChEMU 2020 Lab has shown the viability of automated methods to support information extraction of key information in chemical patents and provides a detailed overview of the Ch EMU 2020 corpus and its annotation, showing that inter-annotator agreement is very strong. Expand
NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature
TLDR
The NLM-Chem corpus consists of 150 full-text articles, doubly annotated by ten expert NLM indexers, with ~5000 unique chemical name annotations, mapped to ~2000 MeSH identifiers, and a substantially improved chemical entity tagger is described. Expand
Automatic identification of relevant chemical compounds from patents
TLDR
An automated system that extracts chemical entities from patents and classifies their relevance with high performance is designed, which enables the extension of the Reaxys database by means of automation. Expand
ChemScanner: extraction and re-use(ability) of chemical information from common scientific documents containing ChemDraw files
TLDR
The ChemScanner project aims to support the chemists in their efforts to re-use chemistry research data by providing them missing tools for an automated assembly of reaction data. Expand
Chemical Reaction Reference Resolution in Patents
Many new chemical compounds are reported each year in patent documents, leading to increasing demand for methods for automatic information extraction of chemical compounds and reactions from patents.Expand
Named Entity Recognition and Normalization Applied to Large-Scale Information Extraction from the Materials Science Literature
TLDR
It is demonstrated that simple database queries can be used to answer complex ``meta-questions" of the published literature that would have previously required laborious, manual literature searches to answer. Expand
Chemical Entity Recognition for MEDLINE Indexing.
TLDR
A collection of 200 MEDLINE titles and abstracts annotated with genes, proteins, inorganic and organic chemicals, as well as other biological molecules is used to evaluate eleven chemical entity recognition systems, where it is sought to identify a tool that effectively recognizes chemical entities for indexing and also performs well on chemical recognition beyond the indexing task. Expand
Opportunities and challenges of text mining inmaterials research
Research publications are the major repository of scientific knowledge. However, their unstructured and highly heterogenous format creates a significant obstacle to large-scale analysis of theExpand
Data Mining Approach for Extraction of Useful Information About Biologically Active Compounds from Publications
TLDR
This study has developed and validated a data mining approach for extraction of text fragments containing description of bioassays and used it to evaluate compounds and their biological activity reported in scientific publications and found that categorization of papers into relevant and irrelevant may be performed based on the machine learning analysis of the abstracts. Expand
DECIMER-Segmentation: Automated extraction of chemical structure depictions from scientific literature
TLDR
DECIMER Segmentation is presented, the first open-source, deep learning-based tool for automated recognition and segmentation of chemical structures from the scientific literature and is hoped to contribute to the development of comprehensive chemical data extraction workflows. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 573 REFERENCES
Overview of the chemical compound and drug name recognition ( CHEMDNER ) task
There is an increasing need to facilitate automated access to information relevant for chemical compounds and drugs described in text, including scientific articles, patents or health agency reports.Expand
CHEMDNER: The drugs and chemical names extraction challenge
TLDR
This task allowed a comparative assessment of the performance of various methodologies using a carefully prepared collection of manually labeled text prepared by specially trained chemists as Gold Standard data, and expected that the tools and resources resulting from this effort will have an impact in future developments of chemical text mining applications. Expand
Chemical named entities recognition: a review on approaches and applications
TLDR
This review sketches out dictionary-based, rule-based and machine learning, as well as hybrid chemical named entity recognition approaches with their applied solutions, and an outlook on the pros and cons of these approaches and the types of chemical entities extracted. Expand
ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature
TLDR
This system provides an extensible, chemistry-aware, natural language processing pipeline for tokenization, part-of-speech tagging, named entity recognition, and phrase parsing, and the novel use of multiple rule-based grammars that are tailored for interpreting specific document domains such as textual paragraphs, captions, and tables. Expand
A document processing pipeline for annotating chemical entities in scientific documents
TLDR
A machine learning-based solution for automatic recognition of chemical and drug names in scientific documents is presented, which applies a rich feature set, including linguistic, orthographic, morphological, dictionary matching and local context features. Expand
Linking genes to literature: text mining, information extraction, and retrieval applications for biology
TLDR
This review presents a general introduction to the main characteristics and applications of currently available text-mining systems for life sciences in terms of the type of biological information demands being addressed; the level of information granularity of both user queries and results; and the features and methods commonly exploited by these applications. Expand
Text Mining for Drugs and Chemical Compounds: Methods, Tools and Applications
TLDR
A good portion of this review is devoted to chemical text mining, and presents the basic concepts and principles underlying the main strategies, and introduces a number of published applications that can be used to build pipelines in topics like drug side effects, toxicity, and protein‐disease‐compound network analysis. Expand
TREC chemical information retrieval – An initial evaluation effort for chemical IR systems
TLDR
The TREC Chemical IR Track focuses on evaluation of search technologies for retrieval and knowledge discovery of digitally stored information on chemical patents and academic journal articles on chemistry. Expand
The CHEMDNER corpus of chemicals and drugs and its annotation principles
TLDR
The CHEMDNER corpus is presented, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. Expand
Automated Extraction of Information from the Literature on Chemical-CYP3A4 Interactions
TLDR
A text mining system that extracts information on chemical-CYP3A4 interactions using a simple but effective pattern matching method based on the order of three keywords will be applicable to interactions of chemicals with any functional proteins, such as enzymes and transporters, simply by changing the list of key verbs. Expand
...
1
2
3
4
5
...