Undesirable biases in NLP: Averting a crisis of measurement

  title={Undesirable biases in NLP: Averting a crisis of measurement},
  author={Oskar van der Wal and Dominik Bachmann and Alina Leidinger and Leendert van Maanen and Willem Zuidema and Katrin Schulz},
As Natural Language Processing (NLP) technology rapidly develops and spreads into daily life, it becomes crucial to anticipate how its use could harm people. However, our ways of assessing the biases of NLP models have not kept up. While especially the detection of English gender bias in such models has enjoyed increasing research attention, many of the measures face serious problems, as it is often unclear what they actually measure and how much they are subject to measurement error. In this… 

Figures and Tables from this paper

Inseq: An Interpretability Toolkit for Sequence Generation Models

This work introduces Inseq, a Python library to democratize access to interpretability analyses of sequence generation models, and showcases its potential by adopting it to highlight gender biases in machine translation models and locate factual knowledge inside GPT-2.



On Measures of Biases and Harms in NLP

Recent studies show that Natural Language Processing (NLP) technologies propagate societal biases about demographic groups associated with attributes such as gender, race, and nationality. To create

Language (Technology) is Power: A Critical Survey of “Bias” in NLP

A greater recognition of the relationships between language and social hierarchies is urged, encouraging researchers and practitioners to articulate their conceptualizations of “bias” and to center work around the lived experiences of members of communities affected by NLP systems.

Social Biases in NLP Models as Barriers for Persons with Disabilities

Evidence of undesirable biases towards mentions of disability in two different English language models: toxicity prediction and sentiment analysis is presented and it is demonstrated that the neural embeddings that are the critical first step in most NLP pipelines similarly contain undesirable biases.

Predictive Biases in Natural Language Processing Models: A Conceptual Framework and Overview

This framework serves as an overview of predictive bias in NLP, integrating existing work into a single structure, and providing a conceptual baseline for improved frameworks.

Mitigating Gender Bias in Natural Language Processing: Literature Review

This paper discusses gender bias based on four forms of representation bias and analyzes methods recognizing gender bias in NLP, and discusses the advantages and drawbacks of existing gender debiasing methods.

A Survey on Gender Bias in Natural Language Processing

A survey of 304 papers onGender bias in natural language processing finds that research on gender bias suffers from four core limitations and sees overcoming these limitations as a necessary development in future research.

Assessing Social and Intersectional Biases in Contextualized Word Representations

Evaluating bias effects at the contextual word level captures biases that are not captured at the sentence level, confirming the need for this novel approach.

Measuring Bias in Contextualized Word Representations

A template-based method to quantify bias in BERT is proposed and it is shown that this method obtains more consistent results in capturing social biases than the traditional cosine based method.

Gender Bias and Under-Representation in Natural Language Processing Across Human Languages

Extensions to profession-level and corpus-level gender bias metric calculations originally designed for English are developed and applied to 8 other languages, including languages like Spanish, Arabic, German, French and Urdu that have grammatically gendered nouns including different feminine, masculine and neuter profession words.

Cross-geographic Bias Detection in Toxicity Modeling

A weakly supervised method to robustly detect lexical biases in broader geocultural contexts is introduced and it is demonstrated that these groupings reflect human judgments of offensive and inoffensive language in those geographic contexts.