Learn More
In word sense disambiguation (WSD), the heuristic of choosing the most common sense is extremely powerful because the distribution of the senses of a word is often skewed. The problem with using the predominant, or first sense heuristic, aside from the fact that it does not take surrounding context into account, is that it assumes some quantity of(More)
Distributions of the senses of words are often highly skewed. This fact is exploited by word sense disambiguation (WSD) systems which back off to the predominant sense of a word when contextual clues are not strong enough. The domain of a document has a strong influence on the sense distribution of words, but it is not feasible to produce large manually(More)
There has been a great deal of recent research into word sense disambiguation, particularly since the inception of the Senseval evaluation exercises. Because a word often has more than one meaning, resolving word sense ambiguity could benefit applications that need some level of semantic interpretation of language input. A major problem is that the accuracy(More)
Maximum Entropy (MaxEnt) models (Jaynes, 1957) are exponential models that implement the intuition that if there is no evidence to favour one alternative solution above another, both alternatives should be equally likely. In order to accomplish this, as much information as possible about the process you want to model must be collected. This information(More)
We us('. seven machine h;arning algorithms tbr one task: idenl;it~ying l)ase holm phrases. The results have 1)een t)rocessed by ditt'erent system combination methods and all of these (mtt)erformed the t)est individual result. We have apt)lied the seven learners with the best (:omt)inatot, a majori ty vote of the top tive systenls, to a s tandard (lata set(More)
We argue that grammatical analysis is a viable alternative to concept spotting for processing spoken input in a practical spoken dialogue system. We discuss the structure of the grammar, and a model for robust parsing which combines linguistic sources of information and statistical sources of information. We discuss test results suggesting that grammatical(More)
Previous work has demonstrated the success of statistical language models when enough training data is available [1], but despite that, grammar-based systems are proving the preferred choice in successful commercial systems such as HeyAnita [2], BeVocal [3] and Tellme [4], largely due to the difficulty involved in obtaining a corpus of training data. Here(More)
In word sense disambiguation (WSD), the heuristic of choosing the most common sense is extremely powerful because the distribution of the senses of a word is often skewed. The first (or predominant) sense heuristic assumes the availability of handtagged data. Whilst there are hand-tagged corpora available for some languages, these are relatively small in(More)
In this paper we show that an unsupervised method for ranking word senses automatically can be used to identify infrequently occurring senses. We demonstrate this using a ranking of noun senses derived from the BNC and evaluating on the sense-tagged text available in both SemCor and the SENSEVAL-2 English all-words task. We show that the method does well at(More)