Learn More
The resolution of lexical ambiguity is important for most natural language processing tasks, and a range of computational techniques have been proposed for its solution. None of these has yet proven effective on a large scale. In this paper, we describe a method for lexical disambiguation of text using the definitions in a machine-readable dictionary(More)
We describe a method for obtaining subject-dependent word sets relative to some (subjecO domain. Using the subject classifications given in the machine-readable version of Longman's Dictionary of Contemporary English, we established subject-dependent co-occurrence links between words of the defining vocabulary to construct these "neighborhoods". Here, we(More)
We describe a technique for automatically constructing a taxonomy of word senses from a machine readable dictionary. Previous taxonomies developed from dictionaries have two properties in common. First, they are based on a somewhat loosely defined notion of the IS-A relation. Second, they require human intervention to identify the sense of the genus term(More)
D ictionaries and computation are two subjects not often brought together in the same article nor even the same proposition. This article explores the growing relations between these two entities and, in particular, investigates whether what is found in traditional dictionaries can be of service to those concerned with getting computers to process and(More)
This paper describes the largest scale annotation project involving the Enron email corpus to date. Over 12,500 emails were classified, by humans, into the categories " Business " and " Personal " , and then sub-categorised by type within these categories. The paper quantifies how well humans perform on this task (evaluated by inter-annotator agreement). It(More)
Until very recently, the email collections that have been available for research have been rather artificially created and consist of emails that contributors have chosen to make available. These collections serve very well for certain applications, but are certainly not representative of a person's email habits; thus, they have not been realistic resources(More)