• Publications
  • Influence
Constrained K-means Clustering with Background Knowledge
This paper demonstrates how the popular k-means clustering algorithm can be protably modied to make use of information about the problem domain that is available in addition to the data instances themselves. Expand
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
This work develops and compares three approaches to detecting deceptive opinion spam, and develops a classifier that is nearly 90% accurate on the authors' gold-standard opinion spam dataset, and reveals a relationship between deceptive opinions and imaginative writing. Expand
Annotating Expressions of Opinions and Emotions in Language
The manual annotation process and the results of an inter-annotator agreement study on a 10,000-sentence corpus of articles drawn from the world press are presented. Expand
Learning to Ask: Neural Question Generation for Reading Comprehension
An attention-based sequence learning model for the task and the effect of encoding sentence- vs. paragraph-level information is investigated and results show that the system significantly outperforms the state-of-the-art rule-based system. Expand
Clustering with Instance-Level Constraints
This paper proposes two types of instance-level clustering constraints { must-link and cannot-link constraints} and shows how they can be incorporated into a clustering algorithm to aid that search. Expand
Negative Deceptive Opinion Spam
This work creates and study the first dataset of deceptive opinion spam with negative sentiment reviews, and finds that standard n-gram text categorization techniques can detect negative deceptive opinions spam with performance far surpassing that of human judges. Expand
OpinionFinder: A System for Subjectivity Analysis
OpinionFinder is a system that performs subjectivity analysis, automatically identifying when opinions, sentiments, speculations, and other private states are present in text. Specifically,Expand
DREAM: A Challenge Data Set and Models for Dialogue-Based Reading Comprehension
Experimental results on the DREAM data set show the effectiveness of dialogue structure and general world knowledge, the first dialogue-based multiple-choice reading comprehension data set to focus on in-depth multi-turn multi-party dialogue understanding. Expand
Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification
An Adversarial Deep Averaging Network (ADAN1) is proposed to transfer the knowledge learned from labeled data on a resource-rich source language to low-resource languages where only unlabeled data exist. Expand
Identifying Anaphoric and Non-Anaphoric Noun Phrases to Improve Coreference Resolution
We present a supervised learning approach to identification of anaphoric and non-anaphoric noun phrases and show how such information can be incorporated into a coreference resolution system. TheExpand