Mitigating Unwanted Biases with Adversarial Learning

@article{Zhang2018MitigatingUB,
  title={Mitigating Unwanted Biases with Adversarial Learning},
  author={B. Zhang and Blake Lemoine and Margaret Mitchell},
  journal={Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society},
  year={2018}
}
Machine learning is a tool for building models that accurately represent input training data. When undesired biases concerning demographic groups are in the training data, well-trained models will reflect those biases. We present a framework for mitigating such biases by including a variable for the group of interest and simultaneously learning a predictor and an adversary. The input to the network X, here text or census data, produces a prediction Y, such as an analogy completion or income… Expand
Generative Adversarial Networks for Mitigating Biases in Machine Learning Systems
TLDR
Experimental results show that the proposed solution can efficiently mitigate different types of biases, while at the same time enhancing the prediction accuracy of the underlying machine learning model. Expand
Efficiently Mitigating Classification Bias via Transfer Learning
TLDR
The proposed Upstream Bias Mitigation for Downstream Fine-Tuning (UBM) framework is proposed, which mitigate one or multiple bias factors in downstream classifiers by transfer learning from an upstream model. Expand
Towards Learning an Unbiased Classifier from Biased Data via Conditional Adversarial Debiasing
TLDR
A novel adversarial debiasing method is presented, which addresses a feature that is spuriously connected to the labels of training images but statistically independent of the labels for test images, so that the automatic identification of relevant features during training is perturbed by irrelevant features. Expand
Fair Representation for Safe Artificial Intelligence via Adversarial Learning of Unbiased Information Bottleneck
TLDR
Non-discriminated representation is formulated as a dual objective optimization problem of encoding data while obfuscating the information about the protected features in the data representation by exploiting the unbiased information bottleneck. Expand
Data Augmentation for Discrimination Prevention and Bias Disambiguation
TLDR
A novel data augmentation technique to create a fairer dataset for model training that could also lend itself to understanding the type of bias existing in the dataset i.e. if bias arises from a lack of representation for a particular group (sampling bias) or if it arises because of human bias reflected in the labels (prejudice based bias). Expand
Latent Adversarial Debiasing: Mitigating Collider Bias in Deep Neural Networks
TLDR
It is argued herein that the cause of failure is a combination of the deep structure of neural networks and the greedy gradient-driven learning process used – one that prefers easyto-compute signals when available. Expand
Bias-Resilient Neural Network
TLDR
A method based on the adversarial training strategy to learn discriminative features unbiased and invariant to the confounder(s) by incorporating a new adversarial loss function that encourages a vanished correlation between the bias and learned features. Expand
Learning Fair Representations via an Adversarial Framework
TLDR
A minimax adversarial framework with a generator to capture the data distribution and generate latent representations, and a critic to ensure that the distributions across different protected groups are similar provides a theoretical guarantee with respect to statistical parity and individual fairness. Expand
Adversarial Removal of Demographic Attributes from Text Data
TLDR
It is shown that demographic information of authors is encoded in—and can be recovered from—the intermediate representations learned by text-based neural classifiers, and the implication is that decisions of classifiers trained on textual data are not agnostic to—and likely condition on—demographic attributes. Expand
Inherent Tradeoffs in Learning Fair Representations
TLDR
This paper provides the first result that quantitatively characterizes the tradeoff between demographic parity and the joint utility across different population groups and proves that if the optimal decision functions across different groups are close, then learning fair representations leads to an alternative notion of fairness, known as the accuracy parity. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 16 REFERENCES
Data Decisions and Theoretical Implications when Adversarially Learning Fair Representations
TLDR
An adversarial training procedure is used to remove information about the sensitive attribute from the latent representation learned by a neural network, and the data distribution empirically drives the adversary's notion of fairness. Expand
Generative Adversarial Nets
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and aExpand
A statistical framework for fair predictive algorithms
TLDR
A method to remove bias from predictive models by removing all information regarding protected variables from the permitted training data is proposed and is general enough to accommodate arbitrary data types, e.g. binary, continuous, etc. Expand
Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings
TLDR
This work empirically demonstrates that its algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks. Expand
Equality of Opportunity in Supervised Learning
TLDR
This work proposes a criterion for discrimination against a specified sensitive attribute in supervised learning, where the goal is to predict some target based on available features and shows how to optimally adjust any learned predictor so as to remove discrimination according to this definition. Expand
Adam: A Method for Stochastic Optimization
TLDR
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Expand
Inherent Trade-Offs in the Fair Determination of Risk Scores
TLDR
Some of the ways in which key notions of fairness are incompatible with each other are suggested, and hence a framework for thinking about the trade-offs between them is provided. Expand
Distributed Representations of Words and Phrases and their Compositionality
TLDR
This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling. Expand
and Johndrow
  • J.
  • 2016
  • 2014
...
1
2
...