You reap what you sow: On the Challenges of Bias Evaluation Under Multilingual Settings

  title={You reap what you sow: On the Challenges of Bias Evaluation Under Multilingual Settings},
  author={Zeerak Talat1 and Aur{\'e}lie N{\'e}v{\'e}ol and Stella Rose Biderman and Miruna Clinciu and Manan Dey and S. Longpre and Alexandra Sasha Luccioni and Maraim Masoud and Margaret Mitchell and Dragomir R. Radev and Shanya Sharma and Arjun Subramonian and Jaesung Tae and Samson Tan1 and Deepak R. Tunuguntla and Oskar van der Wal},
  journal={Proceedings of BigScience Episode \#5 -- Workshop on Challenges \& Perspectives in Creating Large Language Models},
Evaluating bias, fairness, and social impact in monolingual language models is a difficult task. This challenge is further compounded when language modeling occurs in a multilingual context. Considering the implication of evaluation biases for large multilingual language models, we situate the discussion of bias evaluation within a wider context of social scientific research with computational work.We highlight three dimensions of developing multilingual bias evaluation frameworks: (1… 

Tables from this paper

Does Moral Code Have a Moral Code? Probing Delphi's Moral Philosophy
In an effort to guarantee that machine learning model outputs conform with human moral values, recent work has begun exploring the pos-sibility of explicitly training models to learn the difference
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
We introduce GPT-NeoX-20B, a 20 billion parameter autoregressive language model trained on the Pile, whose weights will be made freely and openly available to the public through a permissive license.


Socially Aware Bias Measurements for Hindi Language Representations
This work investigates biases present in Hindi language representations with focuses on caste and religion-associated biases, and demonstrates how biases are unique to language representations based on the history and culture of the region they are widely spoken in.
Evaluating Gender Bias in Natural Language Inference
This work proposes an evaluation methodology to measure genderbias stereotypes by constructing a challenge task which involves pairing gender neutral premise against gender-specific hypothesis and suggests that three models trained on MNLI and SNLI data-sets are significantly prone to genderinduced prediction errors.
Towards Understanding and Mitigating Social Biases in Language Models
The empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information for highfidelity text generation, thereby pushing forward the performance-fairness Pareto frontier.
Measuring Bias in Contextualized Word Representations
A template-based method to quantify bias in BERT is proposed and it is shown that this method obtains more consistent results in capturing social biases than the traditional cosine based method.
What do Bias Measures Measure?
This work presents a comprehensive survey of existing bias measures in NLP as a function of the associated NLP tasks, metrics, datasets, and social biases and corresponding harms and proposes a documentation standard for bias measures to aid their development, categorization, and appropriate usage.
Intersectional Bias in Causal Language Models
It is suggested technical and community-based approaches need to combine to acknowledge and address complex and intersectional language model bias.
A Survey on Gender Bias in Natural Language Processing
A survey of 304 papers onGender bias in natural language processing finds that research on gender bias suffers from four core limitations and sees overcoming these limitations as a necessary development in future research.
Societal Biases in Language Generation: Progress and Challenges
A survey on societal biases in language generation is presented, focusing on how data and techniques contribute to biases and progress towards reducing biases, and the effects of decoding techniques.
CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models
It is found that all three of the widely-used MLMs the authors evaluate substantially favor sentences that express stereotypes in every category in CrowS-Pairs, a benchmark for measuring some forms of social bias in language models against protected demographic groups in the US.
Mitigating Gender Bias in Natural Language Processing: Literature Review
This paper discusses gender bias based on four forms of representation bias and analyzes methods recognizing gender bias in NLP, and discusses the advantages and drawbacks of existing gender debiasing methods.