Learn More
This is a landmark book. For anyone interested in language, in dictionaries and thesauri, or natural language processing, the introduction, Chapters 1-4, and Chapter 16 are must reading. (Select other chapters according to your special interests; see the chapter-by-chapter review). These chapters provide a thorough introduction to the preeminent electronic(More)
We describe our experience in preparing the lexicon and sense-tagged corpora used in the English all-words and lexical sample tasks of SENSEVAL-2. 1 Overview The English lexical sample task is the result of a coordinated effort between the University of Pennsylvania, which provided training/test data for the verbs, and Adam Kilgarriff at Brighton, who(More)
Principles of lexical semantics developed in the course of building an on-line lexical database are discussed. The approach is relational rather than componential. The fundamental semantic relation is synonymy, which is required in order to define the lexicalized concepts that words can be used to express. Other semantic relations between these concepts are(More)
In this paper we discuss a persistent problem arising from polysemy: namely the difficulty of finding consistent criteria for making fine-grained sense distinctions, either manually or automatically. We investigate sources of human annotator disagreements stemming from the tagging for the English Verb Lexical Sample Task in the Senseval-2 exercise in(More)
WORDNET, a ubiquitous tool for natural language processing, suffers from sparsity of connections between its component concepts (synsets). Through the use of human annotators, a subset of the connections between 1000 hand-chosen synsets was assigned a value of " evocation " representing how much the first concept brings to mind the second. These data, along(More)
The Manually Annotated Sub-Corpus (MASC) project provides data and annotations to serve as the base for a community-wide annotation effort of a subset of the American National Corpus. The MASC infrastructure enables the incorporation of contributed annotations into a single, usable format that can then be analyzed as it is or ported to any of a variety of(More)
Arabic is the official language of hundreds of millions of people in twenty Middle East and northern African countries , and is the religious language of all Muslims of various ethnicities around the world. Surprisingly little has been done in the field of computerised language and lexical resources. It is therefore motivating to develop an Ara-bic(More)