Stepmothers are mean and academics are pretentious: What do pretrained language models learn about you?

  title={Stepmothers are mean and academics are pretentious: What do pretrained language models learn about you?},
  author={Rochelle Choenni and Ekaterina Shutova and Robert van Rooij},
In this paper, we investigate what types of stereotypical information are captured by pretrained language models. We present the first dataset comprising stereotypical attributes of a range of social groups and propose a method to elicit stereotypes encoded by pretrained language models in an unsupervised fashion. Moreover, we link the emergent stereotypes to their manifestation as basic emotions as a means to study their emotional effects in a more generalized manner. To demonstrate how our… 
StereoKG: Data-Driven Knowledge Graph Construction For Cultural Knowledge and Stereotypes
This study presents a fully data-driven pipeline for generating a knowledge graph of cultural knowledge and stereotypes and shows that performing intermediate masked language model training on the verbalized KG leads to a higher level of cultural awareness in the model and has the potential to increase classification performance on knowledge-crucial samples on a related task, i.e., hate speech detection.
Pipelines for Social Bias Testing of Large Language Models
This short paper suggests how to use verification techniques in development pipelines by taking inspiration from software testing and addressing social bias evaluation as software testing.
The Birth of Bias: A case study on the evolution of gender bias in an English language model
It is found that the representation of gender is dynamic and identify different phases during training, and it is shown that gender information is represented increasingly locally in the input embeddings of the model and that debiasing these can be effective in reducing the downstream bias.


StereoSet: Measuring stereotypical bias in pretrained language models
StereoSet, a large-scale natural English dataset to measure stereotypical biases in four domains: gender, profession, race, and religion, is presented and it is shown that popular models like BERT, GPT-2, RoBERTa, and XLnet exhibit strong stereotypical biases.
Stereotype and Skew: Quantifying Gender Bias in Pre-trained and Fine-tuned Language Models
This paper proposes two intuitive metrics, skew and stereotype, that quantify and analyse the gender bias present in contextual language models when tackling the WinoBias pronoun resolution task. We
CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models
It is found that all three of the widely-used MLMs the authors evaluate substantially favor sentences that express stereotypes in every category in CrowS-Pairs, a benchmark for measuring some forms of social bias in language models against protected demographic groups in the US.
Measuring Bias in Contextualized Word Representations
A template-based method to quantify bias in BERT is proposed and it is shown that this method obtains more consistent results in capturing social biases than the traditional cosine based method.
On Measuring Social Biases in Sentence Encoders
The Word Embedding Association Test is extended to measure bias in sentence encoders and mixed results including suspicious patterns of sensitivity that suggest the test’s assumptions may not hold in general.
Linguistic Intergroup Bias: Stereotype Perpetuation Through Language
Multilingual Contextual Affective Analysis of LGBT People Portrayals in Wikipedia
The results show systematic differences in how the LGBT community is portrayed across languages, surfacing cultural differences in narratives and signs of social biases.
Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods
A data-augmentation approach is demonstrated that, in combination with existing word-embedding debiasing techniques, removes the bias demonstrated by rule-based, feature-rich, and neural coreference systems in WinoBias without significantly affecting their performance on existing datasets.
Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings
This work empirically demonstrates that its algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks.
The Risk of Racial Bias in Hate Speech Detection
This work proposes *dialect* and *race priming* as ways to reduce the racial bias in annotation, showing that when annotators are made explicitly aware of an AAE tweet’s dialect they are significantly less likely to label the tweet as offensive.