Benjamin van Durme

Learn More
The paper 1 presents a lightweight knowledge-based reasoning framework for the JAVELIN open-domain Question Answering (QA) system. We propose a constrained representation of text meaning, along with a flexible unification strategy that matches questions with retrieved passages based on semantic similarities and weighted relations between words.
This paper improves an existing bilingual paraphrase extraction technique using mono-lingual distributional similarity to rerank candidate paraphrases. Raw monolingual data provides a complementary and orthogonal source of information that lessens the commonly observed errors in bilingual pivot-based methods. Our experiments reveal that monolingual scoring(More)
Efforts to automatically acquire world knowledge from text suffer from the lack of an easy means of evaluating the resulting knowledge. We describe initial experiments using Mechanical Turk to crowdsource evaluation to non-experts for little cost, resulting in a collection of factoids with associated quality judgements. We describe the method of acquiring(More)
We present results for a system designed to perform Open Knowledge Extraction, based on a tradition of compositional language processing, as applied to a large collection of text derived from the Web. Evaluation through manual assessment shows that well-formed propositions of reasonable quality, representing general world knowledge, given in a logical form(More)
This work surveys existing evaluation methodologies for the task of sentence compression, identifies their shortcomings, and proposes alternatives. In particular, we examine the problems of evaluating paraphrastic compression and comparing the output of different models. We demonstrate that compression rate is a strong predictor of compression quality and(More)
We present a substitution-only approach to sentence compression which " tightens " a sentence by reducing its character length. Replacing phrases with shorter paraphrases yields paraphrastic compressions as short as 60% of the original length. In support of this task, we introduce a novel technique for re-ranking paraphrases extracted from bilingual(More)
We establish a novel task in the spirit of news sum-marization and topic detection and tracking (TDT): daily determination of the topics newly popular with Wikipedia readers. Central to this effort is a new public dataset consisting of the hourly page view statistics of all Wikipedia articles over the last three years. We give baseline results for the tasks(More)
We provide a model that extends the split-merge framework of Petrov et al. (2006) to jointly learn latent annotations and Tree Substitution Grammars (TSGs). We then conduct a variety of experiments with this model, first inducing grammars on a portion of the Penn Treebank and the Korean Treebank 2.0, and next experimenting with grammar refinement from a(More)
  • 1