GoTag: A Case Study in Using a Shared UK e-Science Infrastructure for the Automatic Annotation of Medline Documents

Abstract

In this paper we describe our efforts and experience in constructing GoTag, a distributed system for automatically annotating Medline documents with relevant GO (Gene Ontology) terms. The system is built on top of a service-based text mining infrastructure that integrates tools developed within the Discovery Net and myGrid projects. Two baseline approaches to assigning GO terms have been developed. One assigns GO terms based on directly matching GO term names and synonyms in documents; the other uses a trainable document classifier trained over feature vector representations of documents with which GO terms can be associated using the manually curated yeast genome database. We present preliminary results of evaluating these two approaches and discuss proposals for enhancing both baselines, as well as for constructing a hybrid approach.

8 Figures and Tables

Cite this paper

@inproceedings{Davis2005GoTagAC, title={GoTag: A Case Study in Using a Shared UK e-Science Infrastructure for the Automatic Annotation of Medline Documents}, author={Neema Davis and Robert Gaizauskas and Yashuang Guo and Henk Harkema and Ian Roberts and Vasa Curcin}, year={2005} }