George Demetriou

Learn More
Information extraction technology, as defined and developed through the U.S. DARPA Message Understanding Conferences (MUCs), has proved successful at extracting information primarily from newswire texts and primarily in domains concerned with human activity. In this paper we consider the application of this technology to the extraction of information from(More)
MOTIVATION The rapid increase in volume of protein structure literature means useful information may be hidden or lost in the published literature and the process of finding relevant material, sometimes the rate-determining factor in new research, may be arduous and slow. RESULTS We describe the Protein Active Site Template Acquisition (PASTA) system,(More)
In this paper, we describe the construction of a semantically annotated corpus of clinical texts for use in the development and evaluation of systems for automatically extracting clinically significant information from the textual component of patient records. The paper details the sampling of textual material from a collection of 20,000 cancer patient(More)
The Clinical E-Science Framework (CLEF) project is building a framework for the capture, integration and presentation of clinical information: for clinical research, evidence-based health care and genotype-meets-phenotype informatics. A significant portion of the information required by such a framework originates as text, even in EHR-savvy organizations.(More)
In this paper we describe the application of automatic terminology recognition and classi-cation techniques for two bioinformatics projects: extraction of information about enzymes and metabolic pathways and extraction of information about protein structure, in both cases from scientiic journal papers. The techniques we use were adapted from already(More)
This paper presents a case study of the development of an interface to a novel and complex form of document retrieval: searching for texts written in foreign languages based on native language queries. Although the underlying technology for achieving such a search is relatively well understood, the appropriate interface design is not. A study involving(More)
A significant amount of important information in Electronic Health Records (EHRs) is often found only in the unstructured part of patient narratives, making it difficult to process and utilize for tasks such as evidence-based health care or clinical research. In this paper we describe the work carried out in the CLEF project for the semantic annotation of a(More)
We explore the use of Support Vector Machines to recognize personal health information in medical discharge summaries. In addition to the basic token level features, we use entities recognized by an information extraction system designed for newswire text, plus a set of rules that incorporate entityspecific knowledge. The results on the unseen test dataset(More)
Information Extraction (IE), defined as the activity to extract structured knowledge from unstructured text sources, offers new opportunities for the exploitation of biological information contained in the vast amounts of scientific literature. But while IE technology has received increasing attention in the area of molecular biology, there have not been(More)