Learn More
The resolution of lexical ambiguity is important for most natural language processing tasks, and a range of computational techniques have been proposed for its solution. None of these has yet proven effective on a large scale. In this paper, we describe a method for lexical disambiguation of text using the definitions in a machine-readable dictionary(More)
We describe a method for obtaining subject-dependent word sets relative to some (subjecO domain. Using the subject classifications given in the machine-readable version of Longman's Dictionary of Contemporary English, we established subject-dependent co-occurrence links between words of the defining vocabulary to construct these "neighborhoods". Here, we(More)
Data sparsity is a large problem in natural language processing that refers to the fact that language is a system of rare events, so varied and complex, that even using an extremely large corpus, we can never accurately model all possible strings of words. This paper examines the use of skip-grams (a technique where by n-grams are still stored to model(More)
A funny title-I surmise that it will often be misquoted as Electronic Words. Is there a hidden citation behind it? I haven't been able to trace it. 1 Electric Words (henceforth EW, also used to refer jointly to the three authors) is a report on work done to, and with, machine-readable dictionaries, in particular LDOCE, the Longman Dictionary of Contemporary(More)
We describe a technique for automatically constructing a taxonomy of word senses from a machine readable dictionary. Previous taxonomies developed from dictionaries have two properties in common. First, they are based on a somewhat loosely defined notion of the IS-A relation. Second, they require human intervention to identify the sense of the genus term(More)
This paper describes the largest scale annotation project involving the Enron email corpus to date. Over 12,500 emails were classified, by humans, into the categories " Business " and " Personal " , and then sub-categorised by type within these categories. The paper quantifies how well humans perform on this task (evaluated by inter-annotator agreement). It(More)
In this paper, we describe a method of extracting information from an on-line resource for the consmaction of lexical entries for a multilingual , interlingual MT system (ULTRA). We have been able to automatically generate lexical entries for interlingual concepts corresponding to nouns, verbs, adjectives and adverbs. Although several features of these(More)
Machine Readable Dictionaries (MRDs) contain much useful information about lan£uage. Researchers have worked for the last decade on ways to extract this information for language processing systems. But processing dictionaries for use in natural language computation is itself a difficult problem. Transforming information from a version designed for human(More)