Louise Guthrie

Learn More
A funny ti t le--I surmise that it will often be misquoted as Electronic Words. Is there a hidden citation behind it? I haven ' t been able to trace it. 1 Electric Words (henceforth EW, also used to refer jointly to the three authors) is a report on work done to, and with, machine-readable dictionaries, in particular LDOCE, the Longman Dictionary of(More)
Data sparsity is a large problem in natural language processing that refers to the fact that language is a system of rare events, so varied and complex, that even using an extremely large corpus, we can never accurately model all possible strings of words. This paper examines the use of skip-grams (a technique where by n-grams are still stored to model(More)
We describe a method for obtaining subject-dependent word sets relative to some (subjecO domain. Using the subject classifications given in the machine-readable version of Longman's Dictionary of Contemporary English, we established subject-dependent cooccurrence links between words of the defining vocabulary to construct these "neighborhoods". Here, we(More)
In this paper we report on the joint GE/Lockheed Martin/Rutgers/NYU natural language information retrieval project as related to the 5th Text Retrieval Conference (TREC-5). The main thrust of this project is to use natural language processing techniques to enhance the effectiveness of full-text document retrieval. Since our first TREC entry in 1992 (as NYU(More)
We describe a technique for automatically constructing a taxonomy of word senses from a machine readable dictionary. Previous taxonomies developed from dictionaries have two properties in common. First, they are based on a somewhat loosely defined notion of the IS-A relation. Second, they require human intervention to identify the sense of the genus term(More)
In this paper, we describe both a multi-lingual, interlingual MT system (ULTRA) and a method of extracting lexical entries for it automatically from an existing machine-readable dictionary (LDOCE). We believe the latter is original and the former, although not the first interlingual MT System by any means, may be first that is symmetrically multi-lingual.(More)
This paper describes the largest scale annotation project involving the Enron email corpus to date. Over 12,500 emails were classified, by humans, into the categories “Business” and “Personal”, and then subcategorised by type within these categories. The paper quantifies how well humans perform on this task (evaluated by inter-annotator agreement). It(More)