Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP

@article{Schick2021SelfDiagnosisAS,
  title={Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP},
  author={Timo Schick and Sahana Udupa and Hinrich Sch{\"u}tze},
  journal={Transactions of the Association for Computational Linguistics},
  year={2021},
  volume={9},
  pages={1408-1424}
}
Abstract ⚠ This paper contains prompts and model outputs that are offensive in nature. When trained on large, unfiltered crawls from the Internet, language models pick up and reproduce all kinds of undesirable biases that can be found in the data: They often generate racist, sexist, violent, or otherwise toxic language. As large models require millions of training examples to achieve good performance, it is difficult to completely prevent them from being exposed to such content. In this paper… 
Analyzing the Limits of Self-Supervision in Handling Bias in Language
TLDR
This paper defines and comprehensively evaluates how well language models capture the semantics of four tasks for bias: diagnosis, identification, extraction and rephrasing, and indicates that language models are capable of performing these tasks to widely varying degrees across different bias dimensions, such as gender and political affiliation.
Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases
TLDR
It is shown that the largest 530B parameter model is more effective in detecting social bias compared to smaller models (achieving at least 13 % improvement in AUC metric compared to other models) and also maintains a high AUC when the labeled repository is reduced to as few as 100 samples.
Inferring Offensiveness In Images From Natural Language Supervision
TLDR
It is shown that pre-trained transformers themselves provide a methodology for the automated curation of large-scale vision datasets and that one can select relevant prompts for rating the offensiveness of an image.
Toxicity Detection with Generative Prompt-based Inference
TLDR
This work explores the generative variant of zero-shot prompt-based toxicity detection with comprehensive trials on prompt engineering and highlights the strengths of its generative classification approach both quantitatively and qualitatively.
Leveraging Bias in Pre-Trained Word Embeddings for Unsupervised Microaggression Detection
TLDR
This study introduces an unsupervised method to detect microaggressions in natural language expressions that relies on pre-trained wordembeddings, leveraging the bias encoded in the model in order to detectmicroaggression in unseen textual instances.
Auto-Debias: Debiasing Masked Language Models with Automated Biased Prompts
TLDR
The proposed Auto-Debias approach can significantly reduce biases, including gender and racial bias, in pretrained language models such as BERT, RoBERTa and ALBERT and the improvement in fairness does not decrease the language models’ understanding abilities, as shown using the GLUE benchmark.
An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models
TLDR
An empirical survey of five recently proposed bias mitigation techniques: Counterfactual Data Augmentation, Dropout, Iterative Nullspace Projection, Self-Debias, and SentenceDebias is performed, finding self-debias is the strongest debiasing technique, obtaining improved scores on all bias benchmarks.
Mitigating harm in language models with conditional-likelihood filtration
TLDR
This work presents a methodology for programmatically identifying and removing harmful text from web-scale datasets and discusses the generalization of this method and how trigger phrases reflecting specific values can be used by researchers to build language models which are more closely aligned with their values.
Towards Identifying Social Bias in Dialog Systems: Frame, Datasets, and Benchmarks
TLDR
This paper proposes a novel DIALBIAS FRAME for analyzing the social bias in conversations pragmatically, which considers more comprehensive bias-related analyses rather than simple dichotomy annotations, and introduces CDAIL-BIAS DATASET that is the first well-annotated Chinese social bias dialog dataset.
Can Machines Help Us Answering Question 16 in Datasheets, and In Turn Reflecting on Inappropriate Content?
TLDR
This paper proposes to use the information stored in pre-trained transformer models to assist in the documentation process of large image datasets, and suggests that machines can indeed help dataset creators to answer Question 16 in Datasheets on inappropriate image content.
...
...

References

SHOWING 1-10 OF 60 REFERENCES
Identifying and Reducing Gender Bias in Word-Level Language Models
TLDR
This study proposes a metric to measure gender bias and proposes a regularization loss term for the language model that minimizes the projection of encoder-trained embeddings onto an embedding subspace that encodes gender and finds this regularization method to be effective in reducing gender bias.
Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings
TLDR
This work empirically demonstrates that its algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks.
The Woman Worked as a Babysitter: On Biases in Language Generation
TLDR
The notion of the regard towards a demographic is introduced, the varying levels of regard towards different demographics are used as a defining metric for bias in NLG, and the extent to which sentiment scores are a relevant proxy metric for regard is analyzed.
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
TLDR
It is found that pretrained LMs can degenerate into toxic text even from seemingly innocuous prompts, and empirically assess several controllable generation methods find that while data- or compute-intensive methods are more effective at steering away from toxicity than simpler solutions, no current method is failsafe against neural toxic degeneration.
Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them
Word embeddings are widely used in NLP for a vast range of tasks. It was shown that word embeddings derived from text corpora reflect gender biases in society, causing serious concern. Several recent
Masked Language Model Scoring
TLDR
RoBERTa reduces an end-to-end LibriSpeech model’s WER by 30% relative and adds up to +1.7 BLEU on state-of-the-art baselines for low-resource translation pairs, with further gains from domain adaptation.
Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference
TLDR
This work introduces Pattern-Exploiting Training (PET), a semi-supervised training procedure that reformulates input examples as cloze-style phrases to help language models understand a given task.
GeDi: Generative Discriminator Guided Sequence Generation
TLDR
GeDi is proposed as an efficient method for using smaller LMs as generative discriminators to guide generation from large LMs to make them safer and more controllable, and is found that GeDi gives stronger controllability than the state of the art method while also achieving generation speeds more than 30 times faster.
Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints
TLDR
This work proposes to inject corpus-level constraints for calibrating existing structured prediction models and design an algorithm based on Lagrangian relaxation for collective inference to reduce the magnitude of bias amplification in multilabel object classification and visual semantic role labeling.
Dictionary-based Debiasing of Pre-trained Word Embeddings
TLDR
Experimental results on standard benchmark datasets show that the proposed method can accurately remove unfair biases encoded in pre-trained word embeddings, while preserving useful semantics.
...
...