• Corpus ID: 244117167

Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection

@article{Sap2021AnnotatorsWA,
  title={Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection},
  author={Maarten Sap and Swabha Swayamdipta and Laura Vianna and Xuhui Zhou and Yejin Choi and Noah A. Smith},
  journal={ArXiv},
  year={2021},
  volume={abs/2111.07997}
}
Warning : this paper discusses and contains content that is offensive or upsetting. The perceived toxicity of language can vary based on someone’s identity and beliefs, but this variation is often ignored when collecting toxic language datasets, resulting in dataset and model biases. We seek to understand the who , why , and what behind biases in toxicity annotations. In two online studies with demographically and politically diverse participants, we investigate the effect of annotator… 
Towards Identifying Social Bias in Dialog Systems: Frame, Datasets, and Benchmarks
TLDR
This paper proposes a novel DIALBIAS FRAME for analyzing the social bias in conversations pragmatically, which considers more comprehensive bias-related analyses rather than simple dichotomy annotations, and introduces CDAIL-BIAS DATASET that is the first well-annotated Chinese social bias dialog dataset.
Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection
TLDR
It is argued that more care is needed to construct training corpora for language models with better transparency and justification for the inclusion or exclusion of various texts, and that privileging any corpus as high quality entails a language ideology.
Mitigating Toxic Degeneration with Empathetic Data: Exploring the Relationship Between Toxicity and Empathy
TLDR
Using empathetic data, this work improves over recent work on controllable text generation that aims to reduce the toxicity of generated text and observes that the degree of improvement is subject to specific communication components of empathy.
Describing Differences between Text Distributions with Natural Language
TLDR
GPT-3 is applied to describe distribution shifts, debug dataset shortcuts, summarize unknown tasks, and label text clusters, and present analyses based on automatically generated descriptions.
You reap what you sow: On the Challenges of Bias Evaluation Under Multilingual Settings
TLDR
Three dimensions of developing multilingual bias evaluation frameworks are highlighted: increasing transparency through documentation, expanding targets of bias beyond gender, and addressing cultural differences that exist between languages.
Raison d’être of the benchmark dataset: A Survey of Current Practices of Benchmark Dataset Sharing Platforms
TLDR
This paper critically examines the current practices of benchmark dataset sharing in NLP and proposes that the benchmark dataset should develop social impact metadata and data curator should take a role in managing the social impactmetadata.
Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks
TLDR
It is argued that dataset creators should explicitly aim for one or the other of the two contrasting paradigms for data annotation to facilitate the intended use of their dataset.
KOLD: Korean Offensive Language Dataset
Warning : this paper contains content that may be offensive or upsetting Although large attention has been paid to the detection of hate speech, most work has been done in English, failing to make it
Misinfo Reaction Frames: Reasoning about Readers’ Reactions to News Headlines
TLDR
This work demonstrates the feasibility and importance of pragmatic inferences on news headlines to help enhance AI-guided misinformation detection and mitigation and introduces a Misinfo Reaction Frames corpus, a crowdsourced dataset of reactions to over 25k news headlines focusing on global crises.
Mix and Match: Learning-free Controllable Text Generationusing Energy Language Models
TLDR
This work proposes Mix and Match LM, a global score-based alternative for controllable text generation that combines arbitrary pre-trained black- box models for achieving the desired attributes in the generated text without involving any fine-tuning or structural assumptions about the black-box models.
...
1
2
...

References

SHOWING 1-10 OF 115 REFERENCES
Hatred is in the Eye of the Annotator: Hate Speech Classifiers Learn Human-Like Social Stereotypes
TLDR
The results demonstrate that hate speech classifiers learn human-like biases which can further perpetuate social inequalities when propagated at scale, and provide insights into additional sources of bias in hate speech moderation, informing ongoing debates regarding fairness in machine learning.
The Risk of Racial Bias in Hate Speech Detection
TLDR
This work proposes *dialect* and *race priming* as ways to reduce the racial bias in annotation, showing that when annotators are made explicitly aware of an AAE tweet’s dialect they are significantly less likely to label the tweet as offensive.
Whose Opinions Matter? Perspective-aware Models to Identify Opinions of Hate Speech Victims in Abusive Language Detection
TLDR
An in-depth study to model polarized opinions coming from different communities under the hypothesis that similar characteristics can influence the perspectives of annotators on a certain phenomenon, and how this approach improves the prediction performance of a state-of-the-art supervised classifier.
Identifying and Measuring Annotator Bias Based on Annotators’ Demographic Characteristics
TLDR
This work investigates annotator bias using classification models trained on data from demographically distinct annotator groups, and shows that demographic features, such as first language, age, and education, correlate with significant performance differences.
Social Bias Frames: Reasoning about Social and Power Implications of Language
TLDR
It is found that while state-of-the-art neural models are effective at high-level categorization of whether a given statement projects unwanted social bias, they are not effective at spelling out more detailed explanations in terms of Social Bias Frames.
Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets
TLDR
It is shown that model performance improves when training with annotator identifiers as features, and that models are able to recognize the most productive annotators and that often models do not generalize well to examples from annotators that did not contribute to the training set.
Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter
TLDR
It is found that amateur annotators are more likely than expert annotators to label items as hate speech, and that systems training on expert annotations outperform systems trained on amateur annotations.
Ground-Truth, Whose Truth? - Examining the Challenges with Annotating Toxic Text Datasets
TLDR
Re-annotate samples from three toxic text datasets and find that a multi-label approach to annotating toxic text samples can help to improve dataset quality and capture dependence on context and diversity in annotators.
On Releasing Annotator-Level Labels and Information in Datasets
TLDR
It is empirically demonstrated that label aggregation may introduce representational biases of individual and group perspectives, and a set of recommendations for increased utility and transparency of datasets for downstream use cases is proposed.
Language (Technology) is Power: A Critical Survey of “Bias” in NLP
TLDR
A greater recognition of the relationships between language and social hierarchies is urged, encouraging researchers and practitioners to articulate their conceptualizations of “bias” and to center work around the lived experiences of members of communities affected by NLP systems.
...
1
2
3
4
5
...