This work investigates an alternative paradigm that does not require labeled corpora, avoiding the domain dependence of ACE-style algorithms, and allowing the use of corpora of any size.
This work explores the use of Amazon's Mechanical Turk system, a significantly cheaper and faster method for collecting annotations from a broad base of paid non-expert contributors over the Web, and proposes a technique for bias correction that significantly improves annotation quality on two tasks.
This paper presents a new algorithm for automatically learning hypernym (is-a) relations from text, using "dependency path" features extracted from parse trees and introduces a general-purpose formalization and generalization of these patterns.
This work proposes a novel algorithm for inducing semantic taxonomies that flexibly incorporates evidence from multiple classifiers over heterogenous relationships to optimize the entire structure of the taxonomy, using knowledge of a word's coordinate terms to help in determining its hypernyms, and vice versa.
A discriminative classifier is trained over a wide variety of features derived from WordNet structure, corpus-based evidence, and evidence from other lexical resources, and a learned similarity measure outperforms previously proposed automatic methods for sense clustering on the task of predicting human sense merging judgments.
Experiments show that unigram language models smoothed using a normalized extension of stupid backoff and a simple queue for history retention performs well on the task of tracking broad topics in continuous streams of short texts from the microblogging service Twitter.
It is demonstrated how a recently developed statistical approach to mining such relations can be tailored to identify named entity hyponyms, and how as a result, superior question answering performance can be obtained.
A novel framework for recognizing textual entailment that focuses on the use of syntactic heuristics to recognize false entailment is presented, which demonstrates state-of-the-art performance on a widely-used test set.
The data set made available by the PASCAL Rec-ognizing Textual Entailment Challenge provides a great opportunity to focus on the very difficult task of determining whether one sentence is entailed by another, and an accuracy of 74% is in principle achievable for a system with access to a general purpose thesaurus.