• Publications
  • Influence
Named Entity Recognition in Tweets: An Experimental Study
The novel T-ner system doubles F1 score compared with the Stanford NER system, and leverages the redundancy inherent in tweets to achieve this performance, using LabeledLDA to exploit Freebase dictionaries as a source of distant supervision. Expand
Open Language Learning for Information Extraction
Open Information Extraction (IE) systems extract relational tuples from text, without requiring a pre-specified vocabulary, by identifying relation phrases and associated arguments in arbitraryExpand
Open Information Extraction: The Second Generation
The second generation of Open IE systems are described, which rely on a novel model of how relations and their arguments are expressed in English sentences to double precision/recall compared with previous systems such as TEXTRUNNER and WOE. Expand
Adversarial classification
This paper views classification as a game between the classifier and the adversary, and produces a classifier that is optimal given the adversary's optimal strategy, and experiments show that this approach can greatly outperform a classifiers learned in the standard way. Expand
Open domain event extraction from twitter
TwiCal is described-- the first open-domain event-extraction and categorization system for Twitter, and a novel approach for discovering important event categories and classifying extracted events based on latent variable models is presented. Expand
When is Temporal Planning Really Temporal?
A complete state-space temporal planning algorithm is designed, which the authors hope will be able to achieve high performance by leveraging the heuristics that power decision epoch planners. Expand
Generating Coherent Event Schemas at Scale
This work presents a novel approach to inducing open-domain event schemas that overcomes limitations of Chambers and Jurafsky's (2009) schemas and uses cooccurrence statistics of semantically typed relational triples, which it calls Rel-grams (relational n- grams). Expand
Towards Coherent Multi-Document Summarization
G-FLOW is evaluated on Mechanical Turk, and it is found that it generates dramatically better summaries than an extractive summarizer based on a pipeline of state-of-the-art sentence selection and reordering components, underscoring the value of the joint model. Expand
A Latent Dirichlet Allocation Method for Selectional Preferences
LDA-SP, which utilizes LinkLDA to model selectional preferences, combines the benefits of previous approaches: like traditional class-based approaches, it produces human-interpretable classes describing each relation's preferences, but it is competitive with non-class-based methods in predictive power. Expand
POMDP-based control of workflows for crowdsourcing
Crowdsourcing, outsourcing of tasks to a crowd of unknown people ("workers") in an open call, is rapidly rising in popularity. It is already being heavily used by numerous employers ("requesters")Expand