Towards Debiasing Sentence Representations

@inproceedings{Liang2020TowardsDS,
  title={Towards Debiasing Sentence Representations},
  author={Paul Pu Liang and Irene Z Li and Emily Zheng and Y. Lim and R. Salakhutdinov and Louis-Philippe Morency},
  booktitle={ACL},
  year={2020}
}
As natural language processing methods are increasingly deployed in real-world scenarios such as healthcare, legal systems, and social science, it becomes necessary to recognize the role they potentially play in shaping social biases and stereotypes. Previous work has revealed the presence of social biases in widely used word embeddings involving gender, race, religion, and other social constructs. While some methods were proposed to debias these word-level embeddings, there is a need to… Expand
He is very intelligent, she is very beautiful? On Mitigating Social Biases in Language Modelling and Generation
TLDR
This paper proposes an approach to mitigate social biases in BERT, a large pre-trained contextual language model, and shows its effectiveness in fill-in-the-blank sentence completion and summarization tasks, and proposes lexical co-occurrence-based bias penalization in the decoder units in generation frameworks. Expand
Towards Understanding and Mitigating Social Biases in Language Models
TLDR
The empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information for highfidelity text generation, thereby pushing forward the performance-fairness Pareto frontier. Expand
Evaluating Bias In Dutch Word Embeddings
TLDR
The gender bias implicit in Dutch embeddings is explored while investigating whether English language based approaches can also be used in Dutch and how techniques used to measure and reduce bias created for English can beused in Dutch by adequately translating the data and taking into account the unique characteristics of the language. Expand
RedditBias: A Real-World Resource for Bias Evaluation and Debiasing of Conversational Language Models
TLDR
RedDITBIAS is presented, the first conversational data set grounded in the actual human conversations from Reddit, allowing for bias measurement and mitigation across four important bias dimensions: gender, race, religion, and queerness, and an evaluation framework is developed. Expand
Modeling Profanity and Hate Speech in Social Media with Semantic Subspaces
TLDR
This study identifies profane subspaces in word and sentence representations and explores their generalization capability on a variety of similar and distant target tasks in a zero-shot setting, and observes that the subspace-based representations transfer more effectively than standard BERT representations in the zero- shot setting. Expand
CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models
TLDR
It is found that all three of the widely-used MLMs the authors evaluate substantially favor sentences that express stereotypes in every category in CrowS-Pairs, a benchmark for measuring some forms of social bias in language models against protected demographic groups in the US. Expand
FairFil: Contrastive Neural Debiasing Method for Pretrained Text Encoders
TLDR
This paper proposes the first neural debiasing method for a pretrained sentence encoder, which transforms the pretrained encoder outputs into debiased representations via a fair filter (FairFil) network. Expand
Queens Are Powerful Too: Mitigating Gender Bias in Dialogue Generation
TLDR
This work measures gender bias in dialogue data, and examines how this bias is actually amplified in subsequent generative chit-chat dialogue models, and considers three techniques to mitigate gender bias: counterfactual data augmentation, targeted data collection, and bias controlled training. Expand
Measuring Biases of Word Embeddings: What Similarity Measures and Descriptive Statistics to Use?
Word embeddings are widely used in Natural Language Processing (NLP) for a vast range of applications. However, it has been consistently proven that these embeddings reflect the same human biasesExpand
Language (Technology) is Power: A Critical Survey of “Bias” in NLP
TLDR
A greater recognition of the relationships between language and social hierarchies is urged, encouraging researchers and practitioners to articulate their conceptualizations of “bias” and to center work around the lived experiences of members of communities affected by NLP systems. Expand
...
1
2
3
...

References

SHOWING 1-10 OF 55 REFERENCES
On Measuring Social Biases in Sentence Encoders
TLDR
The Word Embedding Association Test is extended to measure bias in sentence encoders and mixed results including suspicious patterns of sensitivity that suggest the test’s assumptions may not hold in general. Expand
Identifying and Reducing Gender Bias in Word-Level Language Models
TLDR
This study proposes a metric to measure gender bias and proposes a regularization loss term for the language model that minimizes the projection of encoder-trained embeddings onto an embedding subspace that encodes gender and finds this regularization method to be effective in reducing gender bias. Expand
Measuring Bias in Contextualized Word Representations
TLDR
A template-based method to quantify bias in BERT is proposed and it is shown that this method obtains more consistent results in capturing social biases than the traditional cosine based method. Expand
Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings
TLDR
This work empirically demonstrates that its algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks. Expand
Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them
Word embeddings are widely used in NLP for a vast range of tasks. It was shown that word embeddings derived from text corpora reflect gender biases in society, causing serious concern. Several recentExpand
Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings
TLDR
This work proposes a method to debias word embeddings in multiclass settings such as race and religion, extending the work of (Bolukbasi et al., 2016) from the binary setting, such as binary gender. Expand
What are the Biases in My Word Embedding?
TLDR
An algorithm for enumerating biases in word embeddings that outputs a number of Word Embedding Association Tests (WEATs) that capture various biases present in the data, which makes it easier to identify biases against intersectional groups, which depend on combinations of sensitive features. Expand
Are We Consistently Biased? Multidimensional Analysis of Biases in Distributional Word Vectors
TLDR
This work presents a systematic study of biases encoded in distributional word vector spaces, and analyzes how consistent the bias effects are across languages, corpora, and embedding models. Expand
Learning Gender-Neutral Word Embeddings
TLDR
A novel training procedure for learning gender-neutral word embeddings that preserves gender information in certain dimensions of word vectors while compelling other dimensions to be free of gender influence is proposed. Expand
Reducing Gender Bias in Abusive Language Detection
TLDR
Three mitigation methods, including debiased word embeddings, gender swap data augmentation, and fine-tuning with a larger corpus, can effectively reduce model bias by 90-98% and can be extended to correct model bias in other scenarios. Expand
...
1
2
3
4
5
...