On Measuring and Mitigating Biased Inferences of Word Embeddings

@article{Dev2020OnMA,
  title={On Measuring and Mitigating Biased Inferences of Word Embeddings},
  author={Sunipa Dev and Tao Li and J. M. Phillips and Vivek Srikumar},
  journal={ArXiv},
  year={2020},
  volume={abs/1908.09369}
}
Word embeddings carry stereotypical connotations from the text they are trained on, which can lead to invalid inferences in downstream models that rely on them. We use this observation to design a mechanism for measuring stereotypes using the task of natural language inference. We demonstrate a reduction in invalid inferences via bias mitigation strategies on static word embeddings (GloVe). Further, we show that for gender bias, these techniques extend to contextualized embeddings when applied… 
OSCaR: Orthogonal Subspace Correction and Rectification of Biases in Word Embeddings
TLDR
OSCaR (Orthogonal Subspace Correction and Rectification), a bias-mitigating method that focuses on disentangling biased associations between concepts instead of removing concepts wholesale, is proposed.
Extensive study on the underlying gender bias in contextualized word embeddings
TLDR
This study points out the advantages and limitations of the various evaluation measures that are used and aims to standardize the evaluation of gender bias in contextualized word embeddings.
Marked Attribute Bias in Natural Language Inference
TLDR
A new observation of gender bias in a downstream NLP application: marked attribute bias in natural language inference is presented, and a new postprocessing debiasing scheme for static word embeddings is proposed.
Debiasing Pre-trained Contextualised Embeddings
TLDR
A fine-tuning method that can be applied at token- or sentence-levels to debias pre-trained contextualised embeddings and finds that applying token-level debiasing for all tokens and across all layers of a contextualisedembedding model produces the best performance.
VERB: Visualizing and Interpreting Bias Mitigation Techniques for Word Representations
Word vector embeddings have been shown to contain and amplify biases in data they are extracted from. Consequently, many techniques have been proposed to identify, mitigate, and attenuate these
UNQOVERing Stereotypical Biases via Underspecified Questions
TLDR
UNQOVER, a general framework to probe and quantify biases through underspecified questions, is presented, showing that a naive use of model scores can lead to incorrect bias estimates due to two forms of reasoning errors: positional dependence and question independence.
Debiasing Multilingual Word Embeddings: A Case Study of Three Indian Languages
TLDR
The current state-of-the-art method for debiasing monolingual word embeddings so as to generalize well in a multilingual setting is advanced and the significance of the bias-mitigation approach on downstream NLP applications is demonstrated.
Sustainable Modular Debiasing of Language Models
TLDR
An extensive evaluation, encompassing three intrinsic and two extrinsic bias measures, renders ADELE very effective in bias mitigation, and it is shown that – due to its modular nature –ADELE, coupled with task adapters, retains fairness even after large-scale downstream training.
Unmasking the Mask - Evaluating Social Biases in Masked Language Models
TLDR
All Unmasked Likelihood (AUL), a bias evaluation measure that predicts all tokens in a test case given the MLM embedding of the unmasked input, is proposed and it is found that AUL accurately detects different types of biases in MLMs.
RedditBias: A Real-World Resource for Bias Evaluation and Debiasing of Conversational Language Models
TLDR
RedDITBIAS is presented, the first conversational data set grounded in the actual human conversations from Reddit, allowing for bias measurement and mitigation across four important bias dimensions: gender, race, religion, and queerness, and an evaluation framework is developed.
...
1
2
3
4
...

References

SHOWING 1-10 OF 25 REFERENCES
Attenuating Bias in Word Vectors
TLDR
New simple ways to detect the most stereotypically gendered words in an embedding and remove the bias from them are explored and it is verified how names are masked carriers of gender bias and then used as a tool to attenuate bias in embeddings.
A Transparent Framework for Evaluating Unintended Demographic Bias in Word Embeddings
TLDR
This work presents a transparent framework and metric for evaluating discrimination across protected groups with respect to their word embedding bias via the relative negative sentiment associated with demographic identity terms from various protected groups and shows that it enable useful analysis into the bias in word embeddings.
Gender Bias in Contextualized Word Embeddings
TLDR
It is shown that a state-of-the-art coreference system that depends on ELMo inherits its bias and demonstrates significant bias on the WinoBias probing corpus and two methods to mitigate such gender bias are explored.
Learning Gender-Neutral Word Embeddings
TLDR
A novel training procedure for learning gender-neutral word embeddings that preserves gender information in certain dimensions of word vectors while compelling other dimensions to be free of gender influence is proposed.
Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them
Word embeddings are widely used in NLP for a vast range of tasks. It was shown that word embeddings derived from text corpora reflect gender biases in society, causing serious concern. Several recent
Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings
TLDR
This work proposes a method to debias word embeddings in multiclass settings such as race and religion, extending the work of (Bolukbasi et al., 2016) from the binary setting, such as binary gender.
Deep Contextualized Word Representations
TLDR
A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals.
Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings
TLDR
This work empirically demonstrates that its algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks.
Annotation Artifacts in Natural Language Inference Data
TLDR
It is shown that a simple text categorization model can correctly classify the hypothesis alone in about 67% of SNLI and 53% of MultiNLI, and that specific linguistic phenomena such as negation and vagueness are highly correlated with certain inference classes.
A Decomposable Attention Model for Natural Language Inference
We propose a simple neural architecture for natural language inference. Our approach uses attention to decompose the problem into subproblems that can be solved separately, thus making it trivially
...
1
2
3
...