Persistent Anti-Muslim Bias in Large Language Models

@article{Abid2021PersistentAB,
  title={Persistent Anti-Muslim Bias in Large Language Models},
  author={Abubakar Abid and Maheen Saleem Farooqi and James Y. Zou},
  journal={Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society},
  year={2021}
}
It has been observed that large-scale language models capture undesirable societal biases, e.g. relating to race and gender; yet religious bias has been relatively unexplored. We demonstrate that GPT-3, a state-of-the-art contextual language model, captures persistent Muslim-violence bias. We probe GPT-3 in various ways, including prompt completion, analogical reasoning, and story generation, to understand this anti-Muslim bias, demonstrating that it appears consistently and creatively in… 

Figures from this paper

Intersectional Bias in Causal Language Models
TLDR
It is suggested technical and community-based approaches need to combine to acknowledge and address complex and intersectional language model bias.
THE New York Times DISTORTS
In this study, I prove a history of bias against Palestine in a newspaper of international importance — the New York Times — during the First and Second Palestinian Intifadas. Using state-of-the-art
Computational Modeling of Stereotype Content in Text
Stereotypes are encountered every day, in interpersonal communication as well as in entertainment, news stories, and on social media. In this study, we present a computational method to mine large,
Roadblocks in Gender Bias Measurement for Diachronic Corpora
The use of word embeddings is an important NLP technique for extracting meaningful conclusions from corpora of human text. One important question that has been raised about word embeddings is the
EXIST 2021: Inducing Bias in Deep Learning Models
TLDR
For the EXIST 2021 sexism detection task, a novel approach is proposed, which will train 3 times the same base model, generating 3 versions of the model, belonging to these models will mean marking or not the tweet as sexist.
The Ghost in the Machine has an American accent: value conflict in GPT-3
The alignment problem in the context of large language models must consider the plurality of human values in our world. Whilst there are many resonant and overlapping values amongst the world’s
Gender and Representation Bias in GPT-3 Generated Stories
Using topic modeling and lexicon-based word similarity, we find that stories generated by GPT-3 exhibit many known gender stereotypes. Generated stories depict different topics and descriptions
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
TLDR
This paper demonstrates a surprising finding: Pretrained language models recognize, to a considerable degree, their undesirable biases and the toxicity of the content they produce and proposes a decoding algorithm that reduces the probability of a language model producing problematic text, known as self-debiasing.
Large pre-trained language models contain human-like biases of what is right and wrong to do
TLDR
The capabilities of the “moral direction” for guiding (even other) LMs towards producing normative text and showcase it on RealToxicityPrompts testbed, preventing the neural toxic degeneration in GPT-2.
Towards an Enhanced Understanding of Bias in Pre-trained Neural Language Models: A Survey with Special Emphasis on Affective Bias
TLDR
The attempt to draw a comprehensive view of bias in pre-trained language models, and especially the exploration of affective bias will be highly beneficial to researchers interested in this evolving field.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 22 REFERENCES
StereoSet: Measuring stereotypical bias in pretrained language models
TLDR
StereoSet, a large-scale natural English dataset to measure stereotypical biases in four domains: gender, profession, race, and religion, is presented and it is shown that popular models like BERT, GPT-2, RoBERTa, and XLnet exhibit strong stereotypical biases.
Identifying and Reducing Gender Bias in Word-Level Language Models
TLDR
This study proposes a metric to measure gender bias and proposes a regularization loss term for the language model that minimizes the projection of encoder-trained embeddings onto an embedding subspace that encodes gender and finds this regularization method to be effective in reducing gender bias.
Gender Bias in Neural Natural Language Processing
TLDR
It is empirically show that CDA effectively decreases gender bias while preserving accuracy, and it is found that as training proceeds on the original data set with gradient descent the gender bias grows as the loss reduces, indicating that the optimization encourages bias; CDA mitigates this behavior.
The Woman Worked as a Babysitter: On Biases in Language Generation
TLDR
The notion of the regard towards a demographic is introduced, the varying levels of regard towards different demographics are used as a defining metric for bias in NLG, and the extent to which sentiment scores are a relevant proxy metric for regard is analyzed.
Reducing Gender Bias in Word-Level Language Models with a Gender-Equalizing Loss Function
TLDR
This research introduces a new term to the loss function which attempts to equalize the probabilities of male and female words in the output, and provides empirical evidence that this approach successfully mitigates gender bias in language models without increasing perplexity.
Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings
TLDR
This work empirically demonstrates that its algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks.
Self-Critical Sequence Training for Image Captioning
TLDR
This paper considers the problem of optimizing image captioning systems using reinforcement learning, and shows that by carefully optimizing systems using the test metrics of the MSCOCO task, significant gains in performance can be realized.
Combining Independent Modules to Solve Multiple-choice Synonym and Analogy Problems
TLDR
Three merging rules for combining probability distributions are examined: the well known mixture rule, the logarithmic rule, and a novel product rule that were applied with state-of-the-art results to two problems commonly used to assess human mastery of lexical semantics|synonym questions and analogy questions.
Gradio: Hassle-Free Sharing and Testing of ML Models in the Wild
TLDR
An open-source Python package, Gradio, which allows researchers to rapidly generate a visual interface for their ML models, and carries out a case study to understand Gradio's usefulness and usability in the setting of a machine learning collaboration between a researcher and a cardiologist.
Attention is All you Need
TLDR
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
...
1
2
3
...