Adversarial Removal of Demographic Attributes from Text Data

@inproceedings{Elazar2018AdversarialRO,
  title={Adversarial Removal of Demographic Attributes from Text Data},
  author={Yanai Elazar and Y. Goldberg},
  booktitle={EMNLP},
  year={2018}
}
Recent advances in Representation Learning and Adversarial Training seem to succeed in removing unwanted features from the learned representation. [...] Key Result Our main conclusion is a cautionary one: do not rely on the adversarial training to achieve invariant representation to sensitive features.Expand
Adversarial Removal of Gender from Deep Image Representations
TLDR
This work provides convincing interpretable visual evidence through an autoencoder-augmented model showing that this approach is performing semantically meaningful removal of gender features, and thus can also be used to remove gender attributes directly from images. Expand
Adversarial Removal of Demographic Attributes Revisited
TLDR
It is shown that a diagnostic classifier trained on the biased baseline neural network also does not generalize to new samples, indicating that it relies on correlations specific to their particular data sample. Expand
Gone at Last: Removing the Hypothesis-Only Bias in Natural Language Inference via Ensemble Adversarial Training
TLDR
It is shown that using an ensemble of adversaries can prevent the bias from being relearned after the model training is completed, further improving how well the model generalises to different NLI datasets. Expand
Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in Deep Image Representations
TLDR
It is shown that trained models significantly amplify the association of target labels with gender beyond what one would expect from biased datasets, and an adversarial approach is adopted to remove unwanted features corresponding to protected variables from intermediate representations in a deep neural network. Expand
Avoiding the Hypothesis-Only Bias in Natural Language Inference via Ensemble Adversarial Training
TLDR
It is shown that the bias can be reduced in the sentence representations by using an ensemble of adversaries, encouraging the model to jointly decrease the accuracy of these different adversaries while fitting the data. Expand
Bias-Resilient Neural Network
TLDR
A method based on the adversarial training strategy to learn discriminative features unbiased and invariant to the confounder(s) by incorporating a new adversarial loss function that encourages a vanished correlation between the bias and learned features. Expand
Adversarial Representation Learning with Closed-Form Solvers
Adversarial representation learning aims to learn data representations for a target task while removing unwanted sensitive information at the same time. Existing methods learn model parametersExpand
Robust Semantic Parsing with Adversarial Learning for Domain Generalization
TLDR
It is shown that adversarial learning yields improved results when using explicit domain classification as the adversarial task, and an unsupervised domain discovery approach is proposed that yields equivalent improvements. Expand
Adversarial Training for Satire Detection: Controlling for Confounding Variables
TLDR
This work proposes a novel model for satire detection with an adversarial component to control for the confounding variable of publication source and shows that the adversarial part is crucial for the model to learn to pay attention to linguistic properties of satire. Expand
Disentangling Document Topic and Author Gender in Multiple Languages: Lessons for Adversarial Debiasing
TLDR
The findings are: individual classifiers for topic and author gender are indeed biased; debiasing with adversarial training works for topic, but breaks down for author gender; and genderdebiasing results differ across languages. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 43 REFERENCES
Aspect-augmented Adversarial Networks for Domain Adaptation
TLDR
A neural method for transfer learning between two (source and target) classification tasks or aspects over the same domain is introduced, using a few keywords pertaining to source and target aspects indicating sentence relevance instead of document class labels. Expand
Learning Anonymized Representations with Adversarial Neural Networks
TLDR
A novel training objective for simultaneously training a predictor over target variables of interest (the regular labels) while preventing an intermediate representation to be predictive of the private labels is introduced. Expand
Data Decisions and Theoretical Implications when Adversarially Learning Fair Representations
TLDR
An adversarial training procedure is used to remove information about the sensitive attribute from the latent representation learned by a neural network, and the data distribution empirically drives the adversary's notion of fairness. Expand
Mitigating Unwanted Biases with Adversarial Learning
TLDR
This work presents a framework for mitigating biases concerning demographic groups by including a variable for the group of interest and simultaneously learning a predictor and an adversary, which results in accurate predictions that exhibit less evidence of stereotyping Z. Expand
Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification
TLDR
An Adversarial Deep Averaging Network (ADAN1) is proposed to transfer the knowledge learned from labeled data on a resource-rich source language to low-resource languages where only unlabeled data exist. Expand
Controllable Invariance through Adversarial Feature Learning
TLDR
This paper shows that the proposed framework induces an invariant representation, and leads to better generalization evidenced by the improved performance on three benchmark tasks. Expand
Censoring Representations with an Adversary
TLDR
This work forms the adversarial model as a minimax problem, and optimize that minimax objective using a stochastic gradient alternate min-max optimizer, and demonstrates the ability to provide discriminant free representations for standard test problems, and compares with previous state of the art methods for fairness. Expand
Generative Adversarial Nets
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and aExpand
Towards Robust and Privacy-preserving Text Representations
TLDR
This paper proposes an approach to explicitly obscure important author characteristics at training time, such that representations learned are invariant to these attributes, which leads to increased privacy in the learned representations. Expand
The Variational Fair Autoencoder
TLDR
This model is based on a variational autoencoding architecture with priors that encourage independence between sensitive and latent factors of variation that is more effective than previous work in removing unwanted sources of variation while maintaining informative latent representations. Expand
...
1
2
3
4
5
...