Mitigating Unwanted Biases with Adversarial Learning

  title={Mitigating Unwanted Biases with Adversarial Learning},
  author={B. Zhang and Blake Lemoine and Margaret Mitchell},
  journal={Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society},
Machine learning is a tool for building models that accurately represent input training data. When undesired biases concerning demographic groups are in the training data, well-trained models will reflect those biases. We present a framework for mitigating such biases by including a variable for the group of interest and simultaneously learning a predictor and an adversary. The input to the network X, here text or census data, produces a prediction Y, such as an analogy completion or income… 

Figures and Tables from this paper

Generative Adversarial Networks for Mitigating Biases in Machine Learning Systems

Experimental results show that the proposed solution can efficiently mitigate different types of biases, while at the same time enhancing the prediction accuracy of the underlying machine learning model.

Representation Learning with Statistical Independence to Mitigate Bias

  • E. AdeliQingyu Zhao K. Pohl
  • Computer Science
    2021 IEEE Winter Conference on Applications of Computer Vision (WACV)
  • 2021
A model based on adversarial training with two competing objectives to learn features that have maximum discriminative power with respect to the task and minimal statistical mean dependence with the protected (bias) variable(s) is proposed.

Efficiently Mitigating Classification Bias via Transfer Learning

The proposed Upstream Bias Mitigation for Downstream Fine-Tuning (UBM) framework is proposed, which mitigate one or multiple bias factors in downstream classifiers by transfer learning from an upstream model.

Towards Learning an Unbiased Classifier from Biased Data via Conditional Adversarial Debiasing

A novel adversarial debiasing method is presented, which addresses a feature that is spuriously connected to the labels of training images but statistically independent of the labels for test images, so that the automatic identification of relevant features during training is perturbed by irrelevant features.

Fair Representation for Safe Artificial Intelligence via Adversarial Learning of Unbiased Information Bottleneck

Non-discriminated representation is formulated as a dual objective optimization problem of encoding data while obfuscating the information about the protected features in the data representation by exploiting the unbiased information bottleneck.

Data Augmentation for Discrimination Prevention and Bias Disambiguation

A novel data augmentation technique to create a fairer dataset for model training that could also lend itself to understanding the type of bias existing in the dataset i.e. if bias arises from a lack of representation for a particular group (sampling bias) or if it arises because of human bias reflected in the labels (prejudice based bias).

Latent Adversarial Debiasing: Mitigating Collider Bias in Deep Neural Networks

It is argued herein that the cause of failure is a combination of the deep structure of neural networks and the greedy gradient-driven learning process used – one that prefers easyto-compute signals when available.

Bias-Resilient Neural Network

A method based on the adversarial training strategy to learn discriminative features unbiased and invariant to the confounder(s) by incorporating a new adversarial loss function that encourages a vanished correlation between the bias and learned features.

Learning Fair Representations via an Adversarial Framework

A minimax adversarial framework with a generator to capture the data distribution and generate latent representations, and a critic to ensure that the distributions across different protected groups are similar provides a theoretical guarantee with respect to statistical parity and individual fairness.

Gradient Based Activations for Accurate Bias-Free Learning

This work shows that a biased discriminator can actually be used to improve this bias-accuracy tradeoff by using a feature masking approach using the discriminator's gradients, and shows that this simple approach works well to reduce bias as well as improve accuracy significantly.



Data Decisions and Theoretical Implications when Adversarially Learning Fair Representations

An adversarial training procedure is used to remove information about the sensitive attribute from the latent representation learned by a neural network, and the data distribution empirically drives the adversary's notion of fairness.

Generative Adversarial Nets

We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a

A statistical framework for fair predictive algorithms

A method to remove bias from predictive models by removing all information regarding protected variables from the permitted training data is proposed and is general enough to accommodate arbitrary data types, e.g. binary, continuous, etc.

Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

This work empirically demonstrates that its algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks.

Equality of Opportunity in Supervised Learning

This work proposes a criterion for discrimination against a specified sensitive attribute in supervised learning, where the goal is to predict some target based on available features and shows how to optimally adjust any learned predictor so as to remove discrimination according to this definition.

Adam: A Method for Stochastic Optimization

This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

Inherent Trade-Offs in the Fair Determination of Risk Scores

Some of the ways in which key notions of fairness are incompatible with each other are suggested, and hence a framework for thinking about the trade-offs between them is provided.

Distributed Representations of Words and Phrases and their Compositionality

This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

Uci machine learning repository

  • Beutel, A.; Chen, J.; Zhao, Z.; and Chi, E. H. 2017. Data decisions and theoretical implications when adversarially learning fair representations. arXiv preprint
  • 2007

and Johndrow

  • J.
  • 2016