Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection

  title={Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection},
  author={Maarten Sap and Swabha Swayamdipta and Laura Vianna and Xuhui Zhou and Yejin Choi and Noah A. Smith},
The perceived toxicity of language can vary based on someone’s identity and beliefs, but this variation is often ignored when collecting toxic language datasets, resulting in dataset and model biases. We seek to understand the *who*, *why*, and *what* behind biases in toxicity annotations. In two online studies with demographically and politically diverse participants, we investigate the effect of annotator identities (*who*) and beliefs (*why*), drawing from social psychology research about… 

Assessing Annotator Identity Sensitivity via Item Response Theory: A Case Study in a Hate Speech Corpus

This work utilizes item response theory (IRT), a methodological approach developed for measurement theory, to quantify annotator identity sensitivity, and uses three different IRT techniques to assess whether an annotator’s racial identity is associated with their ratings on comments that target different racial identities.

Everyone's Voice Matters: Quantifying Annotation Disagreement Using Demographic Information

The paper aims to improve the annotation process for more efficient and inclusive NLP sys- tems through a novel disagreement prediction mechanism and shows that knowing annotators’ demographic information, like gender, ethnicity, and education level, helps predict disagreements.

Noise Audits Improve Moral Foundation Classification

Two metrics to audit the noise of annotations are proposed and experiments show that removing noisy annotations based on the proposed metrics improves classification performance.

Towards Identifying Social Bias in Dialog Systems: Frame, Datasets, and Benchmarks

This paper proposes a novel DIALBIAS FRAME for analyzing the social bias in conversations pragmatically, which considers more comprehensive bias-related analyses rather than simple dichotomy annotations, and introduces CDAIL-BIAS DATASET that is the first well-annotated Chinese social bias dialog dataset.

Impact of Annotator Demographics on Sentiment Dataset Labeling

It is shown that demographic differences among annotators impute a significant effect on their ratings, and that these effects also occur in each component modality of multimodal sentiment data and its component modalities.

Estimating Ground Truth in a Low-labelled Data Regime: A Study of Racism Detection in Spanish

This study analyses a new dataset for detecting racism in Spanish, focusing on solving a ground truth estimate given a few labels and high disagreement, and shows better performance at lower thresholds for classifying messages as racist.

The Measuring Hate Speech Corpus: Leveraging Rasch Measurement Theory for Data Perspectivism

The Measuring Hate Speech corpus, a dataset created to measure hate speech while adjusting for annotators’ perspectives, is introduced, facilitating analyses of interactions between annotator- and comment-level identities, i.e. identity-related annotator perspective.

Addressing religious hate online: from taxonomy creation to automated detection

A fine-grained labeling scheme for religious hate speech detection that lies on a wider and highly-interoperable taxonomy of abusive language, and covers the three main monotheistic religions: Judaism, Christianity and Islam is proposed.

Cascading Biases: Investigating the Effect of Heuristic Annotation Strategies on Data and Models

This work proposes tracking annotator heuristic traces, where it is suggested that tracking heuristic usage among annotators can potentially help with collecting challenging datasets and diagnosing model biases.

Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection

It is argued that more care is needed to construct training corpora for language models with better transparency and justification for the inclusion or exclusion of various texts, and that privileging any corpus as high quality entails a language ideology.



Hatred is in the Eye of the Annotator: Hate Speech Classifiers Learn Human-Like Social Stereotypes

The results demonstrate that hate speech classifiers learn human-like biases which can further perpetuate social inequalities when propagated at scale, and provide insights into additional sources of bias in hate speech moderation, informing ongoing debates regarding fairness in machine learning.

Whose Opinions Matter? Perspective-aware Models to Identify Opinions of Hate Speech Victims in Abusive Language Detection

An in-depth study to model polarized opinions coming from different communities under the hypothesis that similar characteristics can influence the perspectives of annotators on a certain phenomenon, and how this approach improves the prediction performance of a state-of-the-art supervised classifier.

The Risk of Racial Bias in Hate Speech Detection

This work proposes *dialect* and *race priming* as ways to reduce the racial bias in annotation, showing that when annotators are made explicitly aware of an AAE tweet’s dialect they are significantly less likely to label the tweet as offensive.

Identifying and Measuring Annotator Bias Based on Annotators’ Demographic Characteristics

This work investigates annotator bias using classification models trained on data from demographically distinct annotator groups, and shows that demographic features, such as first language, age, and education, correlate with significant performance differences.

Social Bias Frames: Reasoning about Social and Power Implications of Language

It is found that while state-of-the-art neural models are effective at high-level categorization of whether a given statement projects unwanted social bias, they are not effective at spelling out more detailed explanations in terms of Social Bias Frames.

Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets

It is shown that model performance improves when training with annotator identifiers as features, and that models are able to recognize the most productive annotators and that often models do not generalize well to examples from annotators that did not contribute to the training set.

Political psycholinguistics: A comprehensive analysis of the language habits of liberal and conservative social media users.

For nearly a century social scientists have sought to understand left-right ideological differences in values, motives, and thinking styles. Much progress has been made, but-as in other areas of

Language (Technology) is Power: A Critical Survey of “Bias” in NLP

A greater recognition of the relationships between language and social hierarchies is urged, encouraging researchers and practitioners to articulate their conceptualizations of “bias” and to center work around the lived experiences of members of communities affected by NLP systems.

Ground-Truth, Whose Truth? - Examining the Challenges with Annotating Toxic Text Datasets

Re-annotate samples from three toxic text datasets and find that a multi-label approach to annotating toxic text samples can help to improve dataset quality and capture dependence on context and diversity in annotators.

Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter

It is found that amateur annotators are more likely than expert annotators to label items as hate speech, and that systems training on expert annotations outperform systems trained on amateur annotations.