On Measuring and Mitigating Biased Inferences of Word Embeddings

  title={On Measuring and Mitigating Biased Inferences of Word Embeddings},
  author={Sunipa Dev and Tao Li and J. M. Phillips and Vivek Srikumar},
  booktitle={AAAI Conference on Artificial Intelligence},
Word embeddings carry stereotypical connotations from the text they are trained on, which can lead to invalid inferences in downstream models that rely on them. We use this observation to design a mechanism for measuring stereotypes using the task of natural language inference. We demonstrate a reduction in invalid inferences via bias mitigation strategies on static word embeddings (GloVe). Further, we show that for gender bias, these techniques extend to contextualized embeddings when applied… 

OSCaR: Orthogonal Subspace Correction and Rectification of Biases in Word Embeddings

OSCaR (Orthogonal Subspace Correction and Rectification), a bias-mitigating method that focuses on disentangling biased associations between concepts instead of removing concepts wholesale, is proposed.

Extensive study on the underlying gender bias in contextualized word embeddings

This study points out the advantages and limitations of the various evaluation measures that are used and aims to standardize the evaluation of gender bias in contextualized word embeddings.

Sense Embeddings are also Biased – Evaluating Social Biases in Static and Contextualised Sense Embeddings

Sense embedding learning methods learn different embeddings for the different senses of an ambiguous word. One sense of an ambiguous word might be socially biased while its other senses remain

Masked Language Models as Stereotype Detectors?

This work exploits implicit knowledge of stereotypes to exploit an end-to-end stereotype detector using solely a language model, and focuses on measuring stereotypes at data-level, computing bias scores for natural language sentences and documents.

Marked Attribute Bias in Natural Language Inference

A new observation of gender bias in a downstream NLP application: marked attribute bias in natural language inference is presented, and a new postprocessing debiasing scheme for static word embeddings is proposed.

MABEL: Attenuating Gender Bias using Textual Entailment Data

This work proposes MABEL (a Method for Attenuating Gender Bias using Entailment Labels), an intermediate pre-training approach for mitigating gender bias in contextualized representations, and introduces an alignment regularizer that pulls identical entailment pairs along opposite gender directions closer.

Debiasing Pre-trained Contextualised Embeddings

A fine-tuning method that can be applied at token- or sentence-levels to debias pre-trained contextualised embeddings and finds that applying token-level debiasing for all tokens and across all layers of a contextualisedembedding model produces the best performance.

Iterative adversarial removal of gender bias in pretrained word embeddings

This paper proposes an iterative and adversarial procedure to remove gender influence from word representations that should otherwise be free of it, while retaining meaningful gender information in words that are inherently charged with gender polarity.

VERB: Visualizing and Interpreting Bias Mitigation Techniques for Word Representations

Word vector embeddings have been shown to contain and amplify biases in data they are extracted from. Consequently, many techniques have been proposed to identify, mitigate, and attenuate these

UNQOVERing Stereotypical Biases via Underspecified Questions

UNQOVER, a general framework to probe and quantify biases through underspecified questions, is presented, showing that a naive use of model scores can lead to incorrect bias estimates due to two forms of reasoning errors: positional dependence and question independence.



Attenuating Bias in Word Vectors

New simple ways to detect the most stereotypically gendered words in an embedding and remove the bias from them are explored and it is verified how names are masked carriers of gender bias and then used as a tool to attenuate bias in embeddings.

A Transparent Framework for Evaluating Unintended Demographic Bias in Word Embeddings

This work presents a transparent framework and metric for evaluating discrimination across protected groups with respect to their word embedding bias via the relative negative sentiment associated with demographic identity terms from various protected groups and shows that it enable useful analysis into the bias in word embeddings.

Gender Bias in Contextualized Word Embeddings

It is shown that a state-of-the-art coreference system that depends on ELMo inherits its bias and demonstrates significant bias on the WinoBias probing corpus and two methods to mitigate such gender bias are explored.

Learning Gender-Neutral Word Embeddings

A novel training procedure for learning gender-neutral word embeddings that preserves gender information in certain dimensions of word vectors while compelling other dimensions to be free of gender influence is proposed.

Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them

Word embeddings are widely used in NLP for a vast range of tasks. It was shown that word embeddings derived from text corpora reflect gender biases in society, causing serious concern. Several recent

Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings

This work proposes a method to debias word embeddings in multiclass settings such as race and religion, extending the work of (Bolukbasi et al., 2016) from the binary setting, such as binary gender.

Deep Contextualized Word Representations

A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals.

Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

This work empirically demonstrates that its algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks.

Annotation Artifacts in Natural Language Inference Data

It is shown that a simple text categorization model can correctly classify the hypothesis alone in about 67% of SNLI and 53% of MultiNLI, and that specific linguistic phenomena such as negation and vagueness are highly correlated with certain inference classes.

A Decomposable Attention Model for Natural Language Inference

We propose a simple neural architecture for natural language inference. Our approach uses attention to decompose the problem into subproblems that can be solved separately, thus making it trivially