Learn More
INTRODUCTION In this paper, we describe the system used by the UMIST team as members of the FACILE consortium, to undertake the NE task in MUC-7. The main characteristics of this system employed are as follows: it is rule-based its rule formalism supports context-sensitive partial parsing rules may use pattern-matching-style iteration operators the notation(More)
We report on the design and partial implementation of a bilingual English-Arabic dictionary based on WordNet. A relational database is employed to store the lexical and conceptual relations, giving the database extensibility in either language. The data model is extended beyond an Arabic replication of the word↔sense relation to include the morphological(More)
Corpus-based methods have been widely used to tackle NLP tasks after the advent of annotated corpora with a notable success. Inevitably, shifting from classical rule-based to corpus-based method has a major drawback. That is, most of corpus-based ones produce statistical models that are hard to interpret and modify along with their higher complexity in(More)
UK PubMed Central (UKPMC) is a full-text article database that extends the functionality of the original PubMed Central (PMC) repository. The UKPMC project was launched as the first 'mirror' site to PMC, which in analogy to the International Nucleotide Sequence Database Collaboration, aims to provide international preservation of the open and free-access(More)
Text Mining is a relatively new area of research, very interesting for both computational linguists and data miners. It involves collecting and analyzing quantities of textual data by domain experts, whose main task is the manual revision of markup. We describe a suite of tools used to simplify the process: the Parmenides System that consists of data(More)
Curation of biomedical literature is often supported by the automatic analysis of textual content that generally involves a sequence of individual processing components. Text mining (TM) has been used to enhance the process of manual biocuration, but has been focused on specific databases and tasks rather than an environment integrating TM tools into the(More)
Successfully managing information means being able to find relevant new information and to correctly integrate it with pre-existing knowledge. Much information is nowadays stored as multilingual textual data; therefore advanced classification systems are currently considered as strategic components for effective knowledge management. We describe an(More)
We describe our last results at the CoNLL2002 shared task of Named Entity Recognition and Classiication using two approaches that we rst applied to other NLL problems. We have been developing our own modiied TBL learner initially to tackle the Part-of-Speech tagging problem, for integration in a hybrid NLL and rule-based system for information extraction(More)
This paper describes an advanced system for multilingual text classification adaptable to different user needs. The system has been initially developed as an applied research project involving both research centres, industrial bodies and end-user organizations. The project is a considerable success story in the financial field. Three different successful(More)