The next generation of literature analysis: Integration of genomic analysis into text mining

@article{Scherf2005TheNG,
  title={The next generation of literature analysis: Integration of genomic analysis into text mining},
  author={Matthias Scherf and Anton Epple and Thomas Werner},
  journal={Briefings in bioinformatics},
  year={2005},
  volume={6 3},
  pages={
          287-97
        }
}
Text-mining systems are indispensable tools to reduce the increasing flux of information in scientific literature to topics pertinent to a particular interest in focus. Most of the scientific literature is published as unstructured free text, complicating the development of data processing tools, which rely on structured information. To overcome the problems of free text analysis, structured, hand-curated information derived from literature is integrated in text-mining systems to improve… 

Figures from this paper

Mining literature for systems biology

  • P. Roberts
  • Computer Science
    Briefings Bioinform.
  • 2006
These uses of literature, specifically manual curation, derived concepts captured in ontologies and databases, and indirect and direct application of text mining, will be discussed as they pertain to systems biology.

Enhancing Data Integration with Text Analysis to Find Proteins Implicated in Plant Stress Response

An extension to the Ondex data integration framework is presented that uses text mining techniques over Medline abstracts as a method for accessing both bodies of evidence in a consistent way and is able to highlight proteins using the scientific literature that would not have been seen using data integration alone.

PubRunner : A light-weight framework for updating text mining results

PubRunner is a framework for regularly running text mining tools on the latest publications, lightweight, simple to use, and can be integrated with an existing text mining tool.

PubRunner: A light-weight framework for updating text mining results.

PubRunner is a framework for regularly running text mining tools on the latest publications, lightweight, simple to use, and can be integrated with an existing text mining tool.

PubRunner: A light-weight framework for updating text mining results

PubRunner is a framework for regularly running text mining tools on the latest publications and is a proof of concept that is hoped will encourage text mining developers to build tools that truly will aid biologists in exploring the latest Publications.

Validating text mining results on protein-protein interactions using gene expression profiles

  • Deyu ZhouYulan HeC. Kwoh
  • Computer Science
    2006 International Conference on Biomedical and Pharmaceutical Engineering
  • 2006
A probability model is proposed to score the confidence of protein-protein interactions based on both text mining results and gene expression profiles, and experimental results are presented to show the feasibility of this framework.

Mining online full-text literature for novel protein interaction discovery

This article presents a method to extract novel protein interactions from online full-text articles for biomarker discovery by evaluating support and confidence metrics, explicit and implicit protein interactions are extracted from corpus of articles.

Text mining for traditional Chinese medical knowledge discovery: A survey

Biomedical text mining and its applications in cancer research

Automatic reconstruction of a bacterial regulatory network using Natural Language Processing

A state-of-the-art Natural Language Processing system that creates computer-readable networks of regulatory interactions directly from different collections of abstracts and full-text papers to understand how automatic annotation using Text-Mining techniques can complement manual curation of biological databases.
...

References

SHOWING 1-10 OF 55 REFERENCES

Mining the Biomedical Literature in the Genomic Era: An Overview

This paper surveys the disciplines involved in unstructured-text analysis, categorizes current work in biomedical literature mining with respect to these disciplines, and provides examples of text analysis methods applied towards meeting some of the current challenges in bioinformatics.

Content-rich biological network constructed by mining PubMed abstracts

Chilibot distills scientific relationships from knowledge available throughout a wide range of biological domains and presents these in a content-rich graphical format, thus integrating general biomedical knowledge with the specialized knowledge and interests of the user.

Information extraction in molecular biology

The general field of information extraction is introduced, the status of the applications in molecular biology is outlined, and some ideas about possible standards for evaluation that are needed for the future development of the field are discussed.

Accomplishments and challenges in literature data mining for biology

To encourage participation and accelerate progress in this expanding field of literature data mining, it is proposed creating challenge evaluations, and two specific applications are described in this context.

Toward information extraction: identifying protein names from biological papers.

A new method of extracting material names, PROPER, using surface clue on character strings is proposed, which extracts material names in the sentence with 94.70% precision and 98.84% recall, regardless of whether it is already known or newly defined.

GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles

A system is presented that extracts and structures information about cellular pathways from the biological literature in accordance with a knowledge model that was developed earlier and implemented by modifying an existing medical natural language processing system.

MedScan, a natural language processing engine for MEDLINE abstracts

A general biomedical domain-oriented NLP engine called MedScan is presented that efficiently processes sentences from MEDLINE abstracts and produces a set of regularized logical structures representing the meaning of each sentence.

Tagging gene and protein names in biomedical text

This work proposes to approach the detection of gene and protein names in scientific abstracts as part-of-speech tagging, the most basic form of linguistic corpus annotation, and demonstrates that this method can be applied to large sets of MEDLINE abstracts, without the need for special conditions or human experts to predetermine relevant subsets.
...