Learn More
We demonstrate the beneets of a multilingual approach to automatic lexical semantic verb classiication based on statistical analysis of corpora in multiple languages. Our research incorporates two interrelated threads. In one, we exploit the similarities in the crosslinguis-tic classiication of verbs, to extend work on English verb classiication to a new(More)
We present a new, efficient unsupervised approach to the segmentation of corpora into multiword units. Our method involves initial decomposition of common n-grams into segments which maximize within-segment predictability of words, and then further refinement of these segments into a multiword lexicon. Evaluating in four large, distinct corpora, we show(More)
Though the multiword lexicon has long been of interest in computational linguistics, most relevant work is targeted at only a small portion of it. Our work is motivated by the needs of learners for more comprehensive resources reflecting formulaic language that goes beyond what is likely to be codified in a dictionary. Working from an initial sequential(More)
Many NLP applications entail that texts are classified based on their semantic distance (how similar or different the texts are). For example, comparing the text of a new document to those of documents of known topics can help identify the topic of the new text. Typically, a distributional distance is used to capture the implicit semantic distance between(More)
Semantic similarity measures have focused on individual word senses. However, in many applications , it may be informative to compare the overall sense distributions for two different contexts. We propose a new method for comparing two probability distributions over WordNet, which captures in a single measure the aggregate semantic distance of the component(More)
We propose a new method for detecting verb alternations , by comparing the probability distributions over WordNet classes occurring in two potentially alternating argument positions. Existing distance measures compute only the dis-tributional distance, and do not take into account the semantic similarity between Word-Net senses across the distributions. Our(More)
Lexicons of word difficulty are useful for various educational applications, including read-ability classification and text simplification. In this work, we explore automatic creation of these lexicons using methods which go beyond simple term frequency, but without relying on age-graded texts. In particular, we derive information for each word type from(More)
We investigate the use of multilingual data in the automatic classiication of English verbs, and show that there is a useful transfer of information across languages. Speciically, we experiment with three lexical semantic classes of En-glish verbs. We collect statistical features over a sample of English verbs from each of the classes, as well as over(More)
Identifying non-compositional idioms in text using WordNet synsets 2007 Any natural language processing system that does not have a knowledge of non-compositional idioms and their interpretation will make mistakes. Previous authors have attempted to automatically identify these expressions through the property of non-substitutability: similar words cannot(More)