Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora

  title={Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora},
  author={David Yarowsky},
  booktitle={International Conference on Computational Linguistics},
  • David Yarowsky
  • Published in
    International Conference on…
    23 August 1992
  • Computer Science
This paper describes a program that disambiguates English word senses in unrestricted text using statistical models of the major Roget's Thesaurus categories. [] Key Method The selection of categories is accomplished by identifying and weighting words that are indicative of each category when seen in context, using a Bayesian theoretical framework.Other statistical approaches have required special corpora or hand-labeled training examples for much of the lexicon. Our use of class models overcomes this…

Word Sense Disambiguation of Adjectives Using Probabilistic Networks

It is shown how tagged corpora and additional context can be incorporated easily to improve accuracy, and how this technique can be used to disambiguate other types of word pairs, such as verb-noun and adverb-verb pairs.

Methods of Category Classiication Applied to Word-sense Disambiguation and Discourse Analysis a Proposal Technical Information

This work will use a richer class of statistical models than previously used in NLP, along with a set of tools for estimating the parameters of the chosen model from untagged data, and resolving interdependent ambiguities.

Word Sense Disambiguation based on Semantic Density

A metric is introduced and used to measure the semantic density and to rank all possible combinations of the senses of two words, which provides a precision of 58% in indicating the correct sense for both words at the same time.

Distinguishing Word Senses in Untagged Text

An experimental comparison of three unsupervised learning algorithms that distinguish the sense of an ambiguous word in untagged text using McQuitty's similarity analysis, Ward's minimum-variance method, and the EM algorithm.

Corpus-Based Statistical Sense Resolution

The three corpus-based statistical sense resolution methods studied here attempt to infer the correct sense of a polysemous word by using knowledge about patterns of word cooccurrences. The

Principled Disambiguation: Discriminating Adjective Senses with Modified Nouns

This paper argues for a linguistically principled approach to disambiguation, in which relevant contextual clues are narrowly defined, in syntactic and semantic terms, and in which only highly reliable clues are exploited.

Combining machine readable lexical resources and bilingual corpora for broad word sense disambiguation

A new approach to word sense disambiguation (WSD) based on automatically acquired "word sense division" using the English-Chinese bilingual version (LecDOCE) of the Longman Dictionary of Contemporary English (LDOCE).

Automatic Word Sense Disambiguation Using Cooccurrence and Hierarchical Information

We review in detail here a polished version of the systems with which we participated in the SENSEVAL-2 competition English tasks (all words and lexical sample). It is based on a combination of

Combining Unsupervised Lexical Knowledge Methods for Word Sense Disambiguation

A method to combine a set of unsupervised algorithms that can accurately disambiguate word senses in a large, completely untagged corpus is presented, enabling us to construct complete taxonomies for Spanish and French.

Similarity-based Word Sense Disambiguation

Experiments show that this method can learn even from very sparse training data, achieving over 92% correct disambiguation performance.



A method for disambiguating word senses in a large corpus

The proposed method was designed to disambiguate senses that are usually associated with different topics using a Bayesian argument that has been applied successfully in related tasks such as author identification and information retrieval.

Noun Homograph Disambiguation Using Local Context in Large Text Corpora

An accurate, relatively inexpensive method for the disambiguation of noun homographs using large text corpora using both machine readable dictionaries and unrestricted text and the use of training instances is determined to be a crucial di erence.

Word-Sense Disambiguation Using Statistical Methods

A statistical technique for assigning senses to words is described, which incorporated into the statistical machine translation system the error rate of the system decreased by thirteen percent.

Subject-Dependent Co-Occurence and Word Sense Disambiguation

Using the subject classifications given in the machine-redable version of Longman's Dictionary of Contemporary English, subject-dependent co-occurrence links between words of the defining vocabulary are established to construct "neighborhoods" and the application of these neighborhoods to information retrieval is described.

Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone

This procedure uses available dictionaries, so that it will process any text; and uses solely the immediate context to decide which sense of a word is intended (in written English).

Word Sense Disambiguation with Very Large Neural Networks Extracted from Machine Readable Dictionaries

In this paper, we describe a means for automatically building very large neural networks (VLNNs) from definition texts in machine-readable dictionaries, and demonstrate the use of these networks for

Learning to disambiguate

  • S. Weiss
  • Computer Science
    Inf. Storage Retr.
  • 1973

Two Languages Are More Informative Than One

A new approach for resolving lexical ambiguities in one language using statistical data on lexical relations in another language using a statistical model for the selection mechanism is presented.

An Experiment in Computational Discrimination of English Word Senses

Experimental results are presented which suggest that people can consistently determine in which of several given senses a word is being used in text, simply by examining the half dozen or so words just before and just after the word in focus.

Disambiguation by short contexts

This paper describes a technique that is of great help in many text-processing situations, and reports on an experiment recently conducted to test its validity and scope, namely that of disambiguation by short contexts.