Learn More
We demonstrate the beneets of a multilingual approach to automatic lexical semantic verb classiication based on statistical analysis of corpora in multiple languages. Our research incorporates two interrelated threads. In one, we exploit the similarities in the crosslinguis-tic classiication of verbs, to extend work on English verb classiication to a new(More)
Author's disclosures of potential conflict of interest are found at the end of this article. L ymphoma, a condition that is characterized by an abnormal growth of lym-phocytes, can be classified into two main categories: Hodgkin lymphoma (HL) and non-Hodgkin lymphomas (NHL). Anaplastic large cell lymphoma (ALCL) is a subset of peripheral T-cell NHL. In(More)
Semantic similarity measures have focused on individual word senses. However, in many applications , it may be informative to compare the overall sense distributions for two different contexts. We propose a new method for comparing two probability distributions over WordNet, which captures in a single measure the aggregate semantic distance of the component(More)
Many NLP applications entail that texts are classified based on their semantic distance (how similar or different the texts are). For example, comparing the text of a new document to those of documents of known topics can help identify the topic of the new text. Typically, a distributional distance is used to capture the implicit semantic distance between(More)
We present a new, efficient unsupervised approach to the segmentation of corpora into multiword units. Our method involves initial decomposition of common n-grams into segments which maximize within-segment predictability of words, and then further refinement of these segments into a multiword lexicon. Evaluating in four large, distinct corpora, we show(More)
We propose a new method for detecting verb alternations , by comparing the probability distributions over WordNet classes occurring in two potentially alternating argument positions. Existing distance measures compute only the dis-tributional distance, and do not take into account the semantic similarity between Word-Net senses across the distributions. Our(More)
Lexicons of word difficulty are useful for various educational applications, including read-ability classification and text simplification. In this work, we explore automatic creation of these lexicons using methods which go beyond simple term frequency, but without relying on age-graded texts. In particular, we derive information for each word type from(More)
  • Faye Rochelle Baron, Tobi Kral, Stephanie Horn, Amber Wilcox-O 'hearn, Harold Connamacher, Richard Krueger +13 others
  • 2007
Identifying non-compositional idioms in text using WordNet synsets 2007 Any natural language processing system that does not have a knowledge of non-compositional idioms and their interpretation will make mistakes. Previous authors have attempted to automatically identify these expressions through the property of non-substitutability: similar words cannot(More)