Kenneth Ward Church

Learn More
The term word assaciation is used in a very particular sense in the psycholinguistic literature. (Generally speaking, subjects respond quicker than normal to the word "nurse" if it follows a highly associated word such as "doctor.") We wilt extend the term to provide the basis for a statistical description of a variety of interesting linguistic phenomena,(More)
It is well-known that part of speech depends on context. The word ‘‘table,’’ for example, can be a verb in some contexts (e.g., ‘‘He will table the motion’’) and a noun in others (e.g., ‘‘The table is ready’’). A program has been written which tags each word in an input sentence with the most likely part of speech. The program produces the following output(More)
It is well-known that there are polysemous words like sentence whose ‘‘meaning’’ or ‘‘sense’’ depends on the context of use. We have recently reported on two new word-sense disambiguation systems, one trained on bilingual material (the Canadian Hansards) and the other trained on monolingual material (Roget’s Thesaurus and Grolier’s Encyclopedia). As this(More)
There has been considerable interest in random projections, an approximate algorithm for estimating distances between pairs of points in a high-dimensional vector space. Let A in R<sup>n</sup> x D be our n points in D dimensions. The method multiplies A by a random matrix R in R<sup>D</sup> x k, reducing the D dimensions down to just k for speeding up the(More)
We have recently reported on two new word-sense disambiguation systems, one trained on bilingual material (the Canadian Hansards) and the other trained on monolingual material (Roget's Thesaurus and Grolier's Encyclopedia). After using both the monolingual and bilingual classifiers for a few months, we have convinced ourselves that the performance is(More)
Word sense disambiguation has been recognized as a major problem in natural language processing research for over forty years. Both quantitive and qualitative methods have been tried, but much of this work has been stymied by difficulties in acquiring appropriate lexical resources, such as semantic networks and annotated corpora. In particular, much of the(More)
Low frequency words tend to be rich in content, and vice versa. But not all equally frequent words are equally mean!ngful. We will use inverse document frequency (IDF), a quantity borrowed from Information Retrieval, to distinguish words like somewhat and boycott. Both somewhat and boycott appeared approximately 1000 times in a corpus of 1989 Associated(More)