• Corpus ID: 24132050

Mining protein interactions affected by mutations using a NLP based machine learning approach

  title={Mining protein interactions affected by mutations using a NLP based machine learning approach},
  author={Jinchan Qu and Albert Steppi and Jie Hao and Jian Wang and Pei-Yau Lung and Tingting Zhao and Zhe He and Jinfeng Zhang},
The knowledge on the protein/gene interactions that are affected by mutations helps understand phenotypegenotype associations and predict disease prognosis and responses to treatments. Such information is scattered around in scientific literature and its manual curation is very time and resource consuming. Although much research has been done in the past to extract protein-protein interaction (PPI) information automatically from literature, much less has been done on extracting PPIs affected by… 
1 Citations

Tables from this paper

BioKDE: a Deep Learning Powered Search Engine and Biomedical Knowledge Discovery Platform
Search engines play important roles in scientific research by helping scientists find articles relevant to a set of keywords. As more and more papers are being published on a daily basis, the amount


Bayesian inference of protein-protein interactions from biological literature
This study developed a novel methodology based on Bayesian networks for extracting PPI triplets from unstructured text and showed that the method was able to complement human annotations to extract large number of new PPIs from literature.
The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text
The results of the ACT task of BioCreative III indicate that classification of large unbalanced article collections reflecting the real class imbalance is still challenging, and text-mining tools that report ranked lists of relevant articles for manual selection can potentially reduce the time needed to identify half of the relevant articles to less than 1/4 of the time when compared to unranked results.
Overview of the protein-protein interaction annotation extraction task of BioCreative II
The BioCreative II PPI task is the first attempt to compare the performance of text-mining tools specific for each of the basic steps of the PPI extraction pipeline, and challenges identified range from problems in full-text format conversion of articles to difficulties in detecting interactor protein pairs and then linking them to their database records.
A Comprehensive Benchmark of Kernel Methods to Extract Protein–Protein Interactions from Literature
A comprehensive benchmarking of nine different methods for PPI extraction that use convolution kernels on rich linguistic information shows that three kernels are clearly superior to the other methods, and confirms that kernels using dependency trees generally outperform kernels based on syntax trees.
Text Mining Genotype-Phenotype Relationships from Biomedical Literature for Database Curation and Precision Medicine
This paper proposes a high-performance machine learning approach to automate the extraction of disease-gene-variant triplets from biomedical literature that identifies the genes and protein products associated with each mutation from not just the local text content, but from a global context as well (from the Internet and from all literature in PubMed).
Comparative analysis of five protein-protein interaction corpora
This first comparative evaluation of the diverse PPI corpora is presented, performing quantitative evaluation using two separate information extraction methods as well as detailed statistical and qualitative analyses of their properties.
IMID: integrated molecular interaction database
This study integrates molecular interaction information from literature by automatic information extraction and from manually annotated databases to build an integrated molecular interaction database (IMID), which allows complex and versatile queries for context-specific molecular interactions, which are not available currently in other molecular interaction databases.
Integrated Bio-Entity Network: A System for Biological Knowledge Discovery
It is shown that IBN can be used to generate plausible hypotheses, which not only help to better understand the complex interactions in biological systems, but also provide guidance for experimental designs.
Community challenges in biomedical text mining over 10 years: success, failure and the future
This article reviews the different community challenge evaluations held from 2002 to 2014 and their respective tasks and examines these challenge tasks through their targeted problems in NLP research and biomedical applications, respectively.
PubTator: a web-based text mining tool for assisting biocuration
PubTator is described, a web-based system for assisting biocuration that featuring a PubMed-like interface, and being equipped with multiple challenge-winning text mining algorithms to ensure the quality of its automatic results.