Learn More
This paper introduces a recently initiated project that focuses on building a lexical resource for Modern Standard Arabic based on the widely used Princeton WordNet for English (Fellbaum, 1998). Our aim is to develop a linguistic resource with a deep formal semantic foundation in order to capture the richness of Arabic as described in Elkateb (2005). Arabic(More)
Arabic is the official language of hundreds of millions of people in twenty Middle East and northern African countries , and is the religious language of all Muslims of various ethnicities around the world. Surprisingly little has been done in the field of computerised language and lexical resources. It is therefore motivating to develop an Ara-bic(More)
We report on the design and partial implementation of a bilingual English-Arabic dictionary based on WordNet. A relational database is employed to store the lexical and conceptual relations, giving the database extensibility in either language. The data model is extended beyond an Arabic replication of the word↔sense relation to include the morphological(More)
Corpus-based methods have been widely used to tackle NLP tasks after the advent of annotated corpora with a notable success. Inevitably, shifting from classical rule-based to corpus-based method has a major drawback. That is, most of corpus-based ones produce statistical models that are hard to interpret and modify along with their higher complexity in(More)
INTRODUCTION In this paper, we describe the system used by the UMIST team as members of the FACILE consortium, to undertake the NE task in MUC-7. The main characteristics of this system employed are as follows: it is rule-based its rule formalism supports context-sensitive partial parsing rules may use pattern-matching-style iteration operators the notation(More)
UK PubMed Central (UKPMC) is a full-text article database that extends the functionality of the original PubMed Central (PMC) repository. The UKPMC project was launched as the first 'mirror' site to PMC, which in analogy to the International Nucleotide Sequence Database Collaboration, aims to provide international preservation of the open and free-access(More)
AthosMail is a multilingual spoken dialogue system for reading of e-mail messages. The key features of the application are adaptivity and the integration of different approaches for spoken interaction. The application has flexible system structure supporting multiple components for both different and same purposes. The AthosMail system includes components(More)
Text Mining is a relatively new area of research, very interesting for both computational linguists and data miners. It involves collecting and analyzing quantities of textual data by domain experts, whose main task is the manual revision of markup. We describe a suite of tools used to simplify the process: the Parmenides System that consists of data(More)
Curation of biomedical literature is often supported by the automatic analysis of textual content that generally involves a sequence of individual processing components. Text mining (TM) has been used to enhance the process of manual biocuration, but has been focused on specific databases and tasks rather than an environment integrating TM tools into the(More)
We report on the current status of the Arabic WordNet project and in particular on the contents of the database, the lexicographer and user interfaces, the Arabic WordNet browser, linking to the SUMO ontology, the Arabic word spotter, and techniques for semi­automatically extending Arabic WordNet. The central focus of the presentation is on the(More)