Diverse Adversaries for Mitigating Bias in Training

@inproceedings{Han2021DiverseAF,
  title={Diverse Adversaries for Mitigating Bias in Training},
  author={Xudong Han and Timothy Baldwin and Trevor Cohn},
  booktitle={EACL},
  year={2021}
}
Adversarial learning can learn fairer and less biased models of language processing than standard training. However, current adversarial techniques only partially mitigate the problem of model bias, added to which their training procedures are often unstable. In this paper, we propose a novel approach to adversarial learning based on the use of multiple diverse discriminators, whereby discriminators are encouraged to learn orthogonal hidden representations from one another. Experimental results… 

Figures and Tables from this paper

Towards Equal Opportunity Fairness through Adversarial Learning
TLDR
Experimental results over two datasets show that the augmented discriminator for adversarial training substantially improves over standard adversarial debiasing methods, in terms of the performance–fairness trade-off.
Decoupling Adversarial Training for Fair NLP
TLDR
This paper proposes a training strategy which needs only a small volume of protected labels in adversarial training, incorporating an estimation method to transfer private-labelled instances from one dataset to another.
Contrastive Learning for Fair Representations
TLDR
This paper proposes a method for mitigating bias in classifier training by incorporating contrastive learning, in which instances sharing the same class label are encouraged to have similar representations, while instances sharing a protected attribute are forced further apart.
To train or not to train adversarially: A study of bias mitigation strategies for speaker recognition
TLDR
This paper systematically evaluates the biases present in speaker recognition systems with respect to gender across a range of system operating points and proposes adversarial and multi-task learning techniques to improve the fairness of these systems.
Optimising Equal Opportunity Fairness in Model Training
TLDR
This work proposes two novel training objectives which directly optimise for the widely-used criterion of equal opportunity, and shows that they are effective in reducing bias while maintaining high performance over two classification tasks.
Fair NLP Models with Differentially Private Text Encoders
TLDR
This work proposes FEDERATE, an approach that combines ideas from differential privacy and adversarial training to learn private text representations which also induces fairer models, and empirically evaluates the trade-off between the privacy of the representations and the fairness and accuracy of the downstream model on four NLP datasets.
Fairness-aware Class Imbalanced Learning
TLDR
This work evaluates long-tail learning methods for tweet sentiment and occupation classification, and extends a margin-loss based approach with methods to enforce fairness, and empirically shows that the proposed approaches help mitigate both class imbalance and demographic biases.
FairGrad: Fairness Aware Gradient Descent
TLDR
FairGrad, a method to enforce fairness based on a reweighting scheme that iteratively learns group specific weights based on whether they are advantaged or not, is proposed and is comparable to standard baselines over various datasets including ones used in natural language processing and computer vision.
Conditional Supervised Contrastive Learning for Fair Text Classification
TLDR
This work theoretically analyze the connections between learning representations with fairness constraint and conditional supervised contrastive objectives, and proposes to use conditional supervised Contrastive objectives to learn fair representations for text classification via contrastive learning.
Balancing out Bias: Achieving Fairness Through Balanced Training
TLDR
This paper intro-duces a simple, but highly effective, objective for countering bias using balanced training in the form of a gated model, which incorporates protected attributes as input, and shows that it is effective at reducing bias in predictions through demographic input perturbation, outperforming all other bias mitigation techniques when combined with balanced training.
...
...

References

SHOWING 1-10 OF 16 REFERENCES
Adversarial Removal of Demographic Attributes from Text Data
TLDR
It is shown that demographic information of authors is encoded in—and can be recovered from—the intermediate representations learned by text-based neural classifiers, and the implication is that decisions of classifiers trained on textual data are not agnostic to—and likely condition on—demographic attributes.
What’s in a Domain? Learning Domain-Robust Text Representations using Adversarial Training
TLDR
This work proposes a novel method to optimise both in- and out-of-domain accuracy based on joint learning of a structured neural model with domain-specific and domain-general components, coupled with adversarial training for domain.
Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in Deep Image Representations
TLDR
It is shown that trained models significantly amplify the association of target labels with gender beyond what one would expect from biased datasets, and an adversarial approach is adopted to remove unwanted features corresponding to protected variables from intermediate representations in a deep neural network.
Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection
TLDR
This work presents Iterative Null-space Projection (INLP), a novel method for removing information from neural representations based on repeated training of linear classifiers that predict a certain property the authors aim to remove, followed by projection of the representations on their null-space.
Towards Robust and Privacy-preserving Text Representations
TLDR
This paper proposes an approach to explicitly obscure important author characteristics at training time, such that representations learned are invariant to these attributes, which leads to increased privacy in the learned representations.
Factorized Orthogonal Latent Spaces
TLDR
This paper proposes a robust approach to factorizing the latent space into shared and private spaces by introducing orthogonality constraints, which penalize redundant latent representations.
Domain Separation Networks
TLDR
The novel architecture results in a model that outperforms the state-of-the-art on a range of unsupervised domain adaptation scenarios and additionally produces visualizations of the private and shared representations enabling interpretation of the domain adaptation process.
Unsupervised Domain Adaptation by Backpropagation
TLDR
The method performs very well in a series of image classification experiments, achieving adaptation effect in the presence of big domain shifts and outperforming previous state-of-the-art on Office datasets.
Adam: A Method for Stochastic Optimization
TLDR
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems
TLDR
The Equity Evaluation Corpus (EEC) is presented, which consists of 8,640 English sentences carefully chosen to tease out biases towards certain races and genders, and it is found that several of the systems show statistically significant bias.
...
...