OSCaR: Orthogonal Subspace Correction and Rectification of Biases in Word Embeddings

@article{Dev2021OSCaROS,
  title={OSCaR: Orthogonal Subspace Correction and Rectification of Biases in Word Embeddings},
  author={Sunipa Dev and Tao Li and J. M. Phillips and Vivek Srikumar},
  journal={ArXiv},
  year={2021},
  volume={abs/2007.00049}
}
Language representations are known to carry stereotypical biases and, as a result, lead to biased predictions in downstream tasks. While existing methods are effective at mitigating biases by linear projection, such methods are too aggressive: they not only remove bias, but also erase valuable information from word embeddings. We develop new measures for evaluating specific information retention that demonstrate the tradeoff between bias removal and information retention. To address this… 

Figures and Tables from this paper

Marked Attribute Bias in Natural Language Inference

A new observation of gender bias in a downstream NLP application: marked attribute bias in natural language inference is presented, and a new postprocessing debiasing scheme for static word embeddings is proposed.

Iterative adversarial removal of gender bias in pretrained word embeddings

This paper proposes an iterative and adversarial procedure to remove gender influence from word representations that should otherwise be free of it, while retaining meaningful gender information in words that are inherently charged with gender polarity.

VERB: Visualizing and Interpreting Bias Mitigation Techniques for Word Representations

Word vector embeddings have been shown to contain and amplify biases in data they are extracted from. Consequently, many techniques have been proposed to identify, mitigate, and attenuate these

MABEL: Attenuating Gender Bias using Textual Entailment Data

This work proposes MABEL (a Method for Attenuating Gender Bias using Entailment Labels), an intermediate pre-training approach for mitigating gender bias in contextualized representations, and introduces an alignment regularizer that pulls identical entailment pairs along opposite gender directions closer.

UNQOVERing Stereotypical Biases via Underspecified Questions

UNQOVER, a general framework to probe and quantify biases through underspecified questions, is presented, showing that a naive use of model scores can lead to incorrect bias estimates due to two forms of reasoning errors: positional dependence and question independence.

The Geometry of Distributed Representations for Better Alignment, Attenuated Bias, and Improved Interpretability

This work addresses some of the problems pertaining to the transparency and interpretability of high-dimensional representations of language representation, including the detection, quantification, and mitigation of socially biased associations in language representation.

An Interactive Visual Demo of Bias Mitigation Techniques for Word Representations From a Geometric Perspective

This demo utilizes interactive visualization to increase the interpretability of a number of state-of-the-art techniques designed to identify, mitigate, and attenuate biases in word representations, in particular, from a geometric perspective.

A Survey on Bias in Deep NLP

Bias is introduced in a formal way and how it has been treated in several networks, in terms of detection and correction, and a strategy to deal with bias in deep NLP is proposed.

A Survey on Bias in Deep NLP

Bias is introduced in a formal way and how it has been treated in several networks, in terms of detection and correction, and a strategy to deal with bias in deep NLP is proposed.

Linear Adversarial Concept Erasure

This paper formulates the problem of identifying and erasing a linear subspace that corresponds to a given concept in order to prevent linear predictors from recovering the concept, and recovers a low-dimensional subspace whose removal mitigates bias by intrinsic and extrinsic evaluation.

References

SHOWING 1-10 OF 38 REFERENCES

On Measuring and Mitigating Biased Inferences of Word Embeddings

A mechanism for measuring stereotypes using the task of natural language inference is designed and a reduction in invalid inferences via bias mitigation strategies on static word embeddings (GloVe), and it is shown that for gender bias, these techniques extend to contextualizedembeddings when applied selectively only to the static components of contextualized embeddeds.

Attenuating Bias in Word Vectors

New simple ways to detect the most stereotypically gendered words in an embedding and remove the bias from them are explored and it is verified how names are masked carriers of gender bias and then used as a tool to attenuate bias in embeddings.

A General Framework for Implicit and Explicit Debiasing of Distributional Word Vector Spaces

Experimental findings across three embedding methods suggest that the proposed debiasing models are robust and widely applicable: they often completely remove the bias both implicitly and explicitly without degradation of semantic information encoded in any of the input distributional spaces.

VERB: Visualizing and Interpreting Bias Mitigation Techniques for Word Representations

Word vector embeddings have been shown to contain and amplify biases in data they are extracted from. Consequently, many techniques have been proposed to identify, mitigate, and attenuate these

Gender Bias in Contextualized Word Embeddings

It is shown that a state-of-the-art coreference system that depends on ELMo inherits its bias and demonstrates significant bias on the WinoBias probing corpus and two methods to mitigate such gender bias are explored.

Learning Gender-Neutral Word Embeddings

A novel training procedure for learning gender-neutral word embeddings that preserves gender information in certain dimensions of word vectors while compelling other dimensions to be free of gender influence is proposed.

Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

This work empirically demonstrates that its algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks.

Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection

This work presents Iterative Null-space Projection (INLP), a novel method for removing information from neural representations based on repeated training of linear classifiers that predict a certain property the authors aim to remove, followed by projection of the representations on their null-space.

Towards Understanding Gender Bias in Relation Extraction

Recent developments in Neural Relation Extraction (NRE) have made significant strides towards Automated Knowledge Base Construction. While much attention has been dedicated towards improvements in

Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints

This work proposes to inject corpus-level constraints for calibrating existing structured prediction models and design an algorithm based on Lagrangian relaxation for collective inference to reduce the magnitude of bias amplification in multilabel object classification and visual semantic role labeling.