Diana McCarthy

Learn More
In word sense disambiguation (WSD), the heuristic of choosing the most common sense is extremely powerful because the distribution of the senses of a word is often skewed. The problem with using the predominant, or first sense heuristic, aside from the fact that it does not take surrounding context into account, is that it assumes some quantity of(More)
OBJECTIVE To develop the Lens Opacities Classification System III (LOCS III) to overcome the limitations inherent in lens classification using LOCS II. These limitations include unequal intervals between standards, only one standard for color grading, use of integer grading, and wide 95% tolerance limits. DESIGN AND RESULTS The LOCS III contains an(More)
A multiword is compositional if its meaning can be expressed in terms of the meaning of its constituents. In this paper, we collect and analyse the compositionality judgments for a range of compound nouns using Mechanical Turk. Unlike existing compositionality datasets, our dataset has judgments on the contribution of constituent words as well as judgments(More)
This work investigates the variation in a word’s distributionally nearest neighbours with respect to the similarity measure used. We identify one type of variation as being the relative frequency of the neighbour words with respect to the frequency of the target word. We then demonstrate a three-way connection between relative frequency of similar words, a(More)
In this paper we describe the English Lexical Substitution task for SemEval. In the task, annotators and systems find an alternative substitute word or phrase for a target word in context. The task involves both finding the synonyms and disambiguating the context. Participating systems are free to use any lexical resource. There is a subtask which requires(More)
Distributions of the senses of words are often highly skewed. This fact is exploited by word sense disambiguation (WSD) systems which back off to the predominant sense of a word when contextual clues are not strong enough. The domain of a document has a strong influence on the sense distribution of words, but it is not feasible to produce large manually(More)
We apply topic modelling to automatically induce word senses of a target word, and demonstrate that our word sense induction method can be used to automatically detect words with emergent novel senses, as well as token occurrences of those senses. We start by exploring the utility of standard topic models for word sense induction (WSI), with a(More)
ion Substance Artifact Location NOWSD 0.9 36.4 1.3 1.6 SPass 0.2 54.3 0 5.2 FirstS 1.0 38.6 1.3 0.3 COMB 0.4 54.1 0.5 0 types. PTCMs particularly tended to be less specific than ATCMs. Preference models at the PP slot suffered more from sparse data than subject and direct object slots. This was because the slot is less prevalent to start with, and also(More)