Vivian Tsang

Learn More
We demonstrate the beneets of a multilingual approach to automatic lexical semantic verb classiication based on statistical analysis of corpora in multiple languages. Our research incorporates two interrelated threads. In one, we exploit the similarities in the crosslinguis-tic classiication of verbs, to extend work on English verb classiication to a new(More)
We present a new, efficient unsupervised approach to the segmentation of corpora into multiword units. Our method involves initial decomposition of common n-grams into segments which maximize within-segment predictability of words, and then further refinement of these segments into a multiword lexicon. Evaluating in four large, distinct corpora, we show(More)
Modelling Semantic Knowledge for a Word Completion Task Jianhua Li Master of Science Graduate Department of Computer Science University of Toronto 2006 To assist people with physical disabilities in text entry, we have studied the contribution of semantic knowledge in the word completion task. We have first constructed a semantic knowledge base (SKB) that(More)
Lexicons of word difficulty are useful for various educational applications, including readability classification and text simplification. In this work, we explore automatic creation of these lexicons using methods which go beyond simple term frequency, but without relying on age-graded texts. In particular, we derive information for each word type from the(More)
Second Language Information Transfer in Automatic Verb Classification – A Preliminary Investigation Vivian Tsang Master of Science Graduate Department of Computer Science University of Toronto 2001 Lexical semantic classes incorporate both syntactic and semantic information about verbs. Lexical semantic classification of verbs provide a great deal of useful(More)
Though the multiword lexicon has long been of interest in computational linguistics, most relevant work is targeted at only a small portion of it. Our work is motivated by the needs of learners for more comprehensive resources reflecting formulaic language that goes beyond what is likely to be codified in a dictionary. Working from an initial sequential(More)
Many NLP applications entail that texts are classified based on their semantic distance (how similar or different the texts are). For example, comparing the text of a new document to those of documents of known topics can help identify the topic of the new text. Typically, a distributional distance is used to capture the implicit semantic distance between(More)
Semantic similarity measures have focused on individual word senses. However, in many applications, it may be informative to compare the overall sense distributions for two different contexts. We propose a new method for comparing two probability distributions over WordNet, which captures in a single measure the aggregate semantic distance of the component(More)