Wnt pathway curation using automated natural language processing: combining statistical methods with partial and full parse for knowledge extraction

@article{Santos2005WntPC,
  title={Wnt pathway curation using automated natural language processing: combining statistical methods with partial and full parse for knowledge extraction},
  author={Carlos Santos and Daniela Eggle and David J. States},
  journal={Bioinformatics},
  year={2005},
  volume={21 8},
  pages={
          1653-8
        }
}
MOTIVATION Wnt signaling is a very active area of research with highly relevant publications appearing at a rate of more than one per day. Building and maintaining databases describing signal transduction networks is a time-consuming and demanding task that requires careful literature analysis and extensive domain-specific knowledge. For instance, more than 50 factors involved in Wnt signal transduction have been identified as of late 2003. In this work we describe a natural language processing… 

Tables from this paper

Automatic pathway building in biological association networks

TLDR
The automatically curated MedScan data is adequate for automatic generation of good quality signaling networks and the algorithm for the reconstruction of signaling pathways is described and validated by comparison with manually curated pathways and tissue-specific gene expression profiles.

A text-mining system for extracting metabolic reactions from full-text articles

TLDR
It is concluded that automated metabolic pathway construction is more tractable than has often been assumed, and that relatively simple text-mining approaches can prove surprisingly effective.

Machine Learning Techniques for Establishing the Provenance of Biological Interactions in MEDLINE papers

  • Computer Science
  • 2005
TLDR
It is shown that a number of machine learning algorithms can be used to directly establish sentence-level support for given entity-entity interactions in biological databases, and particular interaction entries in database assertions about protein-protein interactions are found.

BSQA: integrated text mining using entity relation semantics extracted from biological literature of insects

TLDR
BeeSpace question/answering (BSQA) system that performs integrated text mining for insect biology, covering diverse aspects from molecular interactions of genes to insect behavior is presented.

The fully automated construction of metabolic pathways using text mining and knowledge-based constraints

TLDR
The development of the Literature Metabolic Pathway Extraction Tool (LiMPET), a text-mining tool designed for the automated extraction of metabolic pathways from article abstracts and full-text open-access articles is described.

Reconstruction of Protein-Protein Interaction Pathways by Mining Subject-Verb-Objects Intermediates

TLDR
This study has constructed Muscorian, using MontyLingua, a generic text processor that uses a two-layered generalization-specialization paradigm previously proposed where text was generically processed to a suitable intermediate format before domain-specific data extraction techniques are applied at the specialization layer.

Machine Learning Techniques for Establishing the Provenance of Biological Interactions

TLDR
A number of machine learning algorithms can be used to directly establish sentence-level support for given entity-entity interactions in biological databases, and specifically focus on findi ng support for specific interaction entries in database assert ions about protein-protein interactions.

Automated extraction of precise protein expression patterns in lymphoma by text mining abstracts of immunohistochemical studies

TLDR
This work proposes establishing a database linking quantitative protein expression levels with specific tumor classifications through NLP, and takes advantage of typical forms of representing experimental findings in terms of percentages of protein expression manifest by the tumor population under study.

New challenges for text mining: mapping between text and manually curated pathways

TLDR
New resources are constructed to link the text with a model pathway and their detailed analysis are addressed, addressing the untapped resource, ‘bio-inference,’ as well as the differences between text and pathway representation.

Automatic extraction of biomolecular interactions: an empirical approach

TLDR
The conclusions reached in this work could be used by other systems to provide evidence about putative interactions, thus supporting efforts to maximize the ability of hybrid systems to support such tasks as annotating and constructing interaction networks.

References

SHOWING 1-10 OF 27 REFERENCES

Extracting human protein interactions from MEDLINE using a full-sentence parser

TLDR
MedScan is presented, a completely automated natural language processing-based information extraction system that is used to extract 2976 interactions between human proteins from MEDLINE abstracts dated after 1988, and suggests that MEDLINE is a unique source of diverse protein function information, which can be extracted in acompletely automated way with a reasonably high precision.

Extraction of protein interaction information from unstructured text using a context-free grammar

TLDR
This work describes a system for extracting PGSM interactions from unstructured text using a lexical analyzer and context free grammar, and demonstrates that efficient parsers can be constructed for extracting these relationships from natural language with high rates of recall and precision.

Mining literature for protein-protein interactions

TLDR
It is shown that the frequencies of words in Medline abstracts can be used to determine whether or not a given paper discusses protein-protein interactions, and the relevant information can be captured for the Database of Interacting Proteins.

Detecting Gene Relations from MEDLINE Abstracts

TLDR
The relative computational simplicity of the proposed method makes it possible to process and analyze large volumes of data in a short time and significantly contributes to and enhances a user's ability to discover such embedded information.

Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions

TLDR
The basic design of a system for automatic detection of protein-protein interactions extracted from scientific abstracts is described and the feasibility of developing a fully automated system able to describe networks of protein interactions with sufficient accuracy is demonstrated.

Kinase pathway database: an integrated protein-kinase and NLP-based protein-interaction resource.

TLDR
The Kinase Pathway Database, an integrated database involving major completely sequenced eukaryotes, is developed, which contains the classification of protein kinases and their functional conservation, ortholog tables among species, protein-protein,protein-gene, and protein-compound interaction data, domain information, and structural information.

Using text analysis to identify functionally coherent gene groups.

TLDR
A method, neighbor divergence, for assessing whether the genes within a group share a common biological function based on their associated scientific literature is presented and achieves 79% sensitivity at 100% specificity, comparing favorably to other tested methods.

Biobibliometrics: information retrieval and visualization from co-occurrences of gene names in Medline abstracts.

  • B. StapleyG. Benoît
  • Computer Science
    Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
  • 2000
TLDR
A prototype system for retrieving and visualizing information from literature and genomic databases using gene names, which is a tool for efficiently exploring the biomedical information landscape and may act as a inference network.

Automatic Annotation for Biological Sequences by Etraction of Keywords from MEDLINE Abstracts: Development of a Prototype System

TLDR
A prototype for the automatic annotation of functional characteristics in protein families able to extract biological information directly from scientific literature in the form of MEDLINE abstracts is developed.

TEXTQUEST: Document Clustering of MEDLINE Abstracts For Concept Discovery In Molecular Biology

TLDR
An algorithm for large-scale document clustering of biological text, obtained from Medline abstracts, based on statistical treatment of terms, stemming, the idea of a 'go-list', unsupervised machine learning and graph layout optimization is presented.