The Potential of Text Mining in Data Integration and Network Biology for Plant Research: A Case Study on Arabidopsis[C][W]

  title={The Potential of Text Mining in Data Integration and Network Biology for Plant Research: A Case Study on Arabidopsis[C][W]},
  author={Sofie Van Landeghem and Stefanie De Bodt and Zuzanna J Drebert and Dirk Inz{\'e} and Yves van de Peer},
  journal={Plant Cell},
  pages={794 - 807}
Manual evaluation of state-of-the art text mining data reveals promising results for its application in plant network biology. Focusing on Arabidopsis thaliana, an integrated network of text mining and experimental data highlights the complementarity of these resources and the necessity for text mining tools to uncover the latest relevant findings from the literature. Despite the availability of various data repositories for plant research, a wealth of information currently remains hidden… 

EVEX in ST’13: Application of a large-scale text mining resource to event extraction and network construction

This paper introduces the participation in the latest Shared Task using the largescale text mining resource EVEX, which was previously implemented using state-ofthe-art algorithms, and which was applied to the whole of PubMed and PubMed Central.

Extracting knowledge networks from plant scientific literature: potato tuber flesh color as an exemplary trait

A novel time-based analysis of networks of flesh color of potato indicates that network-assisted hypothesis generation shows promise for knowledge discovery, data integration and hypothesis generation in scientific research.

Information Extraction for the Seed Development Regulatory Networks of Arabidopsis Thaliana. (Extraction d'Information pour les réseaux de régulation de la graine chez Arabidopsis Thaliana)

This work proposes Information Extraction (IE) as an efficient approach for producing structured,usable information on biology, by presenting a complete IE task on a model biological organism, Arabidopsis thaliana.

Manually curated database of rice proteins

The feasibility of digitizing the experimental data itself is demonstrated by creating a database on rice proteins based on in-house developed data curation models, which has data for over 1800 rice proteins curated from >4000 different experiments of over 400 research articles.

Application of the EVEX resource to event extraction and network construction: Shared Task entry and result analysis

A re-ranking approach to improve the precision of an existing event extraction system, incorporating features from the EVEX resource, and a novel machine learning based conversion system is implemented and benchmarked its performance against the original rule-based system.

RLIMS-P 2.0: A Generalizable Rule-Based Information Extraction System for Literature Mining of Protein Phosphorylation Information

RLIMS-P version 2.0, an enhanced rule-based information extraction system for mining kinase, substrate, and phosphorylation site information from scientific literature, is introduced, including the capability of processing full-text articles and generalizability towards different post-translational modifications (PTMs).

Machine Learning–Based Differential Network Analysis: A Study of Stress-Responsive Transcriptomes in ArabidopsisW

An ML-based methodology for transcriptome analysis via comparison of gene coexpression networks, implemented as an R package called machine learning–based differential network analysis (mlDNA), is presented and applied to reanalyze a set of abiotic stress expression data in Arabidopsis thaliana.

Machine Learning–Based Differential Network Analysis: A Study of Stress-Responsive Transcriptomes in Arabidopsis[W]

This work presents a machine learning–based method for transcriptome analysis via comparison of gene coexpression networks, which outperforms traditional statistical tests at identifying stress-related genes, and applies this method to reanalyze a set of abiotic stress expression data in Arabidopsis thaliana.

The research on gene-disease association based on text-mining of PubMed

A novel method that integrates MeSH database, term weight (TW), and co-occurrence methods to predict gene-disease associations based on the cosine similarity between gene vectors and disease vectors outperforms heterogeneous network edge prediction (HNEP) in aspects of precision rate and recall rate.



Combining literature text mining with microarray data: advances for system biology modeling

An easy to use and freely accessible tool, GeneWizard, that exploits text mining and microarray data fusion for supporting researchers in discovering gene-disease relationships is described.

PLAN2L: a web tool for integrated text mining and literature-based bioentity relation extraction

PAN2L, a web-based online search system that integrates text mining and information extraction techniques to access systematically information useful for analyzing genetic, cellular and molecular aspects of the plant model organism Arabidopsis thaliana is presented.

Large-Scale Event Extraction from Literature with Multi-Level Gene Normalization

This study investigates how fully automated text mining of complex biomolecular events can be augmented with a normalization strategy that identifies biological concepts in text, mapping them to identifiers at varying levels of granularity, ranging from canonicalized symbols to unique gene and proteins and broad gene families.

Text mining for the biocuration workflow

Analysis of interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow that can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community.

EVEX: A PubMed-Scale Resource for Homology-Based Generalization of Text Mining Predictions

A recent dataset of biomolecular events extracted from text is refined, and a disambiguation algorithm is implemented that uniquely links the arguments of 11.2 million biomolescular events to well-defined gene families, providing interesting opportunities for query expansion and hypothesis generation.

CORNET: A User-Friendly Tool for Data Mining and Integration1[W]

This work developed CORNET (for CORrelation NETworks) as an access point to transcriptome, protein interactome, and localization data and functional information on Arabidopsis as well as two flexible and versatile tools, namely the coexpression tool and the protein-protein interaction tool.

Exploring Biomolecular Literature with EVEX: Connecting Genes through Events, Homology, and Indirect Associations

A freely available web application built on top of 21.3 million detailed biomolecular events extracted from all PubMed abstracts, generated by a state-of-the-art event extraction system and enriched with gene family associations and abstract generalizations, accounting for lexical variants and synonymy.

Web-Queryable Large-Scale Data Sets for Hypothesis Generation in Plant Biology

This review provides an overview of several genomic, epigenomic, transcriptomic, proteomic, and metabolomic data sets and describes Web-based tools for querying them in the context of hypothesis generation for plant biology.

Text mining in the biocuration workflow: applications for literature curation at WormBase, dictyBase and TAIR

It is found that, with organism-specific modifications, Textpresso can be used by dictyBase and TAIR to annotate gene productions to GO's Cellular Component (CC) ontology.

CORNET 2.0: integrating plant coexpression, protein-protein interactions, regulatory interactions, gene associations and functional annotations.

The new functionalities of CORNET 2.0 for data integration in plants are presented, including the integration of regulatory interaction datasets accessible through the new transcription factor (TF) tool that can be used in combination with the coexpression tool or the PPI tool.