• Publications
  • Influence
Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter
TLDR
A list of criteria founded in critical race theory is provided, and these are used to annotate a publicly available corpus of more than 16k tweets and present a dictionary based the most indicative words in the data.
Learning Whom to Trust with MACE
TLDR
MACE (Multi-Annotator Competence Estimation) learns in an unsupervised fashion to identify which annotators are trustworthy and predict the correct underlying labels, and shows considerable improvements over standard baselines, both for predicted label accuracy and trustworthiness estimates.
Personality Traits on Twitter—or—How to Get 1,500 Personality Tests in a Week
TLDR
The experiments show that social media data can provide sufficient linguistic evidence to reliably predict two of four personality dimensions, and a novel corpus of 1.2M English tweets annotated with Myers-Briggs personality type and gender is presented.
Identifying Metaphorical Word Use with Tree Kernels
TLDR
This work uses SVMs with tree-kernels on a balanced corpus of 3872 instances, created by bootstrapping from available metaphor lists to identify metaphorical use and outperform two baselines, a sequential and a vectorbased approach.
Predictive Biases in Natural Language Processing Models: A Conceptual Framework and Overview
TLDR
This framework serves as an overview of predictive bias in NLP, integrating existing work into a single structure, and providing a conceptual baseline for improved frameworks.
User Review Sites as a Resource for Large-Scale Sociolinguistic Studies
TLDR
This research aims to remedy both problems by exploring a large new data source, international review websites with user profiles, which provide more text data than manually collected studies, and more meta-data than most available social media text.
SemEval-2016 Task 10: Detecting Minimal Semantic Units and their Meanings (DiMSUM)
TLDR
This task combines the labeling of multiword expressions and supersenses (coarse-grained classes) in an explicit, yet broad-coverage paradigm for lexical semantics in a multi-domain evaluation setting, indicating that the task remains largely unresolved.
What’s in a Preposition? Dimensions of Sense Disambiguation for an Interesting Word Class
TLDR
This work examines the parameters that must be considered in prepositions, namely context, features, and granularity, and delivers an increased performance that significantly improves over two state-of-the-art systems, and shows potential for improving other word sense disambiguation tasks.
Multitask Learning for Mental Health Conditions with Limited Social Media Data
TLDR
The framework proposed significantly improves over all baselines and single-task models for predicting mental health conditions, with particularly significant gains for conditions with limited data, and establishes for the first time the potential of deep learning in the prediction of mental health from online user-generated text.
Demographic Factors Improve Classification Performance
TLDR
By including age or gender information in text-classification tasks consistently and significantly improve performance over demographic-agnostic models, which are commonly used in natural language processing tasks.
...
...