• Publications
  • Influence
Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification
TLDR
This work extends to sentiment classification the recently-proposed structural correspondence learning (SCL) algorithm, reducing the relative error due to adaptation between domains by an average of 30% over the original SCL algorithm and 46% over a supervised baseline.
Adaptive regularization of weight vectors
TLDR
Empirical evaluations show that AROW achieves state-of-the-art performance on a wide range of binary and multiclass tasks, as well as robustness in the face of non-separable data.
You Are What You Tweet: Analyzing Twitter for Public Health
TLDR
This work applies the recently introduced Ailment Topic Aspect Model to over one and a half million health related tweets and discovers mentions of over a dozen ailments, including allergies, obesity and insomnia, suggesting that Twitter has broad applicability for public health research.
Confidence-weighted linear classification
TLDR
Empirical evaluation on a range of NLP tasks show that the confidence-weighted linear classifiers introduced here improves over other state of the art online and batch methods, learns faster in the online setting, and lends itself to better classifier combination after parallel training.
Named Entity Recognition for Chinese Social Media with Jointly Trained Embeddings
TLDR
A new corpus of Weibo messages annotated for both name and nominal mentions is presented and a joint training objective for the embeddings that makes use of both (NER) labeled and unlabeled raw text is proposed.
Quantifying Mental Health Signals in Twitter
TLDR
A novel method for gathering data for a range of mental illnesses quickly and cheaply is presented, then analysis of four in particular: post-traumatic stress disorder, depression, bipolar disorder, and seasonal affective disorder are focused on.
Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT
TLDR
This paper explores the broader cross-lingual potential of mBERT (multilingual) as a zero shot language transfer model on 5 NLP tasks covering a total of 39 languages from various language families: NLI, document classification, NER, POS tagging, and dependency parsing.
Weaponized Health Communication: Twitter Bots and Russian Trolls Amplify the Vaccine Debate
TLDR
Whereas bots that spread malware and unsolicited content disseminated antivaccine messages, Russian trolls promoted discord, showing that directly confronting vaccine skeptics enables bots to legitimize the vaccine debate.
Annotating Named Entities in Twitter Data with Crowdsourcing
We describe our experience using both Amazon Mechanical Turk (MTurk) and Crowd-Flower to collect simple named entity annotations for Twitter status updates. Unlike most genres that have traditionally
Measuring Post Traumatic Stress Disorder in Twitter
TLDR
PTSD is considered, a serious condition that affects millions worldwide, with especially high rates in military veterans, and its utility is demonstrated by examining differences in language use between PTSD and random individuals, building classifiers to separate these two groups and by detecting elevated rates of PTSD at and around U.S. military bases using classifiers.
...
...