Learn More
This article presents an algorithm for translating the Penn Treebank into a corpus of Combinatory Categorial Grammar (CCG) derivations augmented with local and long-range word–word dependencies. The resulting corpus,CCGbank,includes 99.4% of the sentences in the Penn Treebank. It is available from the Linguistic Data Consortium,and has been used to train(More)
We propose to use the visual denotations of linguistic expressions (i.e. the set of images they describe) to define novel denotational similarity metrics, which we show to be at least as beneficial as distributional similarities for two tasks that require semantic inference. To compute these denotational similarities, we construct a denotation graph, i.e. a(More)
We present an algorithm which translates the Penn Treebank into a corpus of Combinatory Categorial Grammar (CCG) derivations. To do this we have needed to make several systematic changes to the Treebank which have to effect of cleaning up a number of errors and inconsistencies. This process has yielded a cleaner treebank that can potentially be used in any(More)
The ability to associate images with natural language sentences that describe what is depicted in them is a hallmark of image understanding, and a prerequisite for applications such as sentence-based image search. In analogy to image search, we propose to frame sentence-based image annotation as the task of ranking a given pool of captions. We introduce a(More)
Crowd-sourcing approaches such as Ama-zon's Mechanical Turk (MTurk) make it possible to annotate or collect large amounts of linguistic data at a relatively low cost and high speed. However, MTurk offers only limited control over who is allowed to particpate in a particular task. This is particularly problematic for tasks requiring free-form text entry.(More)
Humans can prepare concise descriptions of pictures, focus-ing on what they find important. We demonstrate that automatic methods can do so too. We describe a system that can compute a score linking an image to a sentence. This score can be used to attach a descriptive sentence to a given image, or to obtain images that illustrate a given sentence. The(More)
This paper compares a number of gen-erative probability models for a wide-coverage Combinatory Categorial Grammar (CCG) parser. These models are trained and tested on a corpus obtained by translating the Penn Treebank trees into CCG normal-form derivations. According to an evaluation of unlabeled word-word dependencies, our best model achieves a performance(More)
This paper describes and analyzes our Se-mEval 2014 Task 1 system. Its features are based on distributional and denota-tional similarities; word alignment; negation ; and hypernym/hyponym, synonym, and antonym relations. SemEval 2014 Task 1 (Marelli et al., 2014a) evaluates system predictions of semantic relatedness (SR) and textual entailment (TE)(More)