• Corpus ID: 59158788

Attenuating Bias in Word Vectors

  title={Attenuating Bias in Word Vectors},
  author={Sunipa Dev and J. M. Phillips},
  booktitle={International Conference on Artificial Intelligence and Statistics},
Word vector representations are well developed tools for various NLP and Machine Learning tasks and are known to retain significant semantic and syntactic structure of languages. [] Key Method We verify how names are masked carriers of gender bias and then use that as a tool to attenuate bias in embeddings. Further, we extend this property of names to show how names can be used to detect other types of bias in the embeddings such as bias based on race, ethnicity, and age.

Figures and Tables from this paper

Bias in word embeddings

A new technique for bias detection for gendered languages is developed and used to compare bias in embeddings trained on Wikipedia and on political social media data, and it is proved that existing biases are transferred to further machine learning models.

OSCaR: Orthogonal Subspace Correction and Rectification of Biases in Word Embeddings

OSCaR (Orthogonal Subspace Correction and Rectification), a bias-mitigating method that focuses on disentangling biased associations between concepts instead of removing concepts wholesale, is proposed.

On Measuring and Mitigating Biased Inferences of Word Embeddings

A mechanism for measuring stereotypes using the task of natural language inference is designed and a reduction in invalid inferences via bias mitigation strategies on static word embeddings (GloVe), and it is shown that for gender bias, these techniques extend to contextualizedembeddings when applied selectively only to the static components of contextualized embeddeds.

A General Framework for Implicit and Explicit Debiasing of Distributional Word Vector Spaces

Experimental findings across three embedding methods suggest that the proposed debiasing models are robust and widely applicable: they often completely remove the bias both implicitly and explicitly without degradation of semantic information encoded in any of the input distributional spaces.

sweater: Speedy Word Embedding Association Test and Extras Using R

The goal of this R package is to detect associations among words in word embedding spaces. Word embeddings can capture how similar or different two words are in terms of implicit and explicit

Robustness and Reliability of Gender Bias Assessment in Word Embeddings: The Role of Base Pairs

It is shown that the well-known analogy “man is to computer-programmer as woman is to homemaker” is due to word similarity rather than bias, which has important implications for work on measuring bias in embeddings and related work debiasing embeddINGS.

Evaluating Gender Bias in Hindi-English Machine Translation

This work attempts to evaluate and quantify the gender bias within a Hindi-English machine translation system by implementing a modified version of the existing TGBI metric based on the grammatical considerations for Hindi.

Socially Aware Bias Measurements for Hindi Language Representations

This work investigates the biases present in Hindi language representations such as caste and religion associated biases and demonstrates how biases are unique to specific language representations based on the history and culture of the region they are widely spoken in.

A Survey on Bias in Deep NLP

Bias is introduced in a formal way and how it has been treated in several networks, in terms of detection and correction, and a strategy to deal with bias in deep NLP is proposed.

Word Embeddings via Causal Inference: Gender Bias Reducing and Semantic Information Preserving

This work proposes a novel methodology that leverages a causal inference framework to effectively remove gender bias and achieves state-of-the-art results in gender-debiasing tasks.



Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

This work empirically demonstrates that its algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks.

Linguistic Regularities in Sparse and Explicit Word Representations

It is demonstrated that analogy recovery is not restricted to neural word embeddings, and that a similar amount of relational similarities can be recovered from traditional distributional word representations.

Women also Snowboard: Overcoming Bias in Captioning Models

A new Equalizer model is introduced that ensures equal gender probability when gender Evidence is occluded in a scene and confident predictions when gender evidence is present and has lower error than prior work when describing images with people and mentioning their gender and more closely matches the ground truth ratio of sentences including women to sentences including men.

Efficient Estimation of Word Representations in Vector Space

Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.

Semantics derived automatically from language corpora contain human-like biases

It is shown that machines can learn word associations from written texts and that these associations mirror those learned by humans, as measured by the Implicit Association Test (IAT), and that applying machine learning to ordinary human language results in human-like semantic biases.

Neural Word Embedding as Implicit Matrix Factorization

It is shown that using a sparse Shifted Positive PMI word-context matrix to represent words improves results on two word similarity tasks and one of two analogy tasks, and conjecture that this stems from the weighted nature of SGNS's factorization.

SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation

SimLex-999 is presented, a gold standard resource for evaluating distributional semantic models that improves on existing resources in several important ways, and explicitly quantifies similarity rather than association or relatedness so that pairs of entities that are associated but not actually similar have a low rating.

WordNet : an electronic lexical database

The lexical database: nouns in WordNet, Katherine J. Miller a semantic network of English verbs, and applications of WordNet: building semantic concordances are presented.

Placing search in context: the concept revisited

A new conceptual paradigm for performing search in context is presented, that largely automates the search process, providing even non-professional users with highly relevant results.

Consumer Credit Risk Models Via Machine-Learning Algorithms

We apply machine-learning techniques to construct nonlinear nonparametric forecasting models of consumer credit risk. By combining customer transactions and credit bureau data from January 2005 to