Learn More
This paper presents a probabilistic model for sense disambiguation which chooses the best sense based on the conditional probability of sense paraphrases given a context. We use a topic model to decompose this conditional probability into two conditional probabilities with latent variables. We propose three different instanti-ations of the model for solving(More)
What is figurative language and why is it a problem? Unambiguous Idiom The 19th century windjammers like Cutty Sark were able to maintain progress by and large even in bad wind conditions. Ambiguous Idiom The government agent spilled the beans on the secret dossier. When Peter reached for the salt he knocked over the can and spilled the beans all over the(More)
We propose an unsupervised method for distinguishing literal and non-literal usages of idiomatic expressions. Our method determines how well a literal interpretation is linked to the overall cohesive structure of the discourse. If strong links can be found, the expression is classified as literal, otherwise as idiomatic. We show that this method can help to(More)
We investigate the effectiveness of different linguistic cues for distinguishing literal and non-literal usages of potentially idiomatic expressions. We focus specifically on features that generalize across different target expressions. While idioms on the whole are frequent, instances of each particular expression can be relatively infrequent and it will(More)
We propose a novel unsupervised approach for distinguishing literal and non-literal use of idiomatic expressions. Our model combines an unsupervised and a supervised classifier. The former bases its decision on the cohesive structure of the context and labels training data for the latter, which can then take a larger feature space into account. We show that(More)
We present a graph-based model for representing the lexical cohesion of a discourse. In the graph structure, vertices correspond to the content words of a text and edges connecting pairs of words encode how closely the words are related semantically. We show that such a structure can be used to distinguish literal and non-literal usages of multi-word(More)
Information-theoretic measures are among the most standard techniques for evaluation of clustering methods including word sense induction (WSI) systems. Such measures rely on sample-based estimates of the entropy. However, the standard maximum likelihood estimates of the entropy are heavily biased with the bias dependent on, among other things, the number(More)
Words I hardly ever need to water the plant that grows in my yard because of the leak in the drains. Germany's coalition government has announced a reversal of policy that will see all the country's nuclear power plants phased out by 2022. Idioms Dissanayake said that Kumaratunga was " playing with fire " after she accused military's top brass of(More)
  • 1