Learn More
\~e report the results of a study into the use of a linear interpolating hidden Marker model (HMM) for the task of extra.('ting lxw]mi(:al |;er-minology fl:om MEDLINE al)stra('ts and texl;s in the molecular-bioh)gy domain. Tiffs is the first stage isl a. system that will exl;ra('l; evenl; information for automatically ut)da.ting 1)ioh)gy databases. We(More)
We present two measures for comparing corpora based on infbrmation theory statistics such as gain ratio as well as simple term-class ~equency counts. We tested the predictions made by these measures about corpus difficulty in two domains-news and molecular biology using the result of two well-used paradigms for NE, decision trees and HMMs and found that(More)
We present an outline of the genome information acquisition (GENIA) project for automatically extracting biochemical information from journal papers and abstracts. GENIA will be available over the Internet and is designed to aid in information extraction, retrieval and vi-sualisation and to help reduce information overload on researchers. The vast(More)
The tagging of Named Entities, the names of particular things or classes, is regarded as an important component technology for many NLP applications. The first Named Entity set had 7 types, organization, location, person, date, time, money and percent expressions. Later, in the IREX project artifact was added and ACE added two, GPE and facility, to pursue(More)
We have developed a sentence extraction system that estimates the significance of sentences by integrating four scoring functions that use as evidence sentence location, sentence length, TF/IDF values of words, and similarity to the title. Similarity to a given query is also added to the system in the summarization task for information retrieval. Parameters(More)
Huge quantities of on-line medical texts such as Medline are available, and we w ould hope to extract useful information from these resources, as much as possible, hopefully in an automatic way, with the aid of computer technologies. Especially, recent advances in Natural Language Processing (NLP) techniques raise new challenges and opportunities for(More)
UK PubMed Central (UKPMC) is a full-text article database that extends the functionality of the original PubMed Central (PMC) repository. The UKPMC project was launched as the first 'mirror' site to PMC, which in analogy to the International Nucleotide Sequence Database Collaboration, aims to provide international preservation of the open and free-access(More)
Corpus annotation is now a key topic for all areas of natural language processing (NLP) and information extraction (IE) which employ supervised learning. With the explosion of results in molecular-biology there is an increased need for IE to extract knowledge to support database building and to search intelligently for information in online journal(More)