Corpus ID: 225068149

Efficiently Mitigating Classification Bias via Transfer Learning

  title={Efficiently Mitigating Classification Bias via Transfer Learning},
  author={Xisen Jin and Francesco Barbieri and Aida Mostafazadeh Davani and Brendan Kennedy and Leonardo Neves and Xiang Ren},
Prediction bias in machine learning models refers to unintended model behaviors that discriminate against inputs mentioning or produced by certain groups; for example, hate speech classifiers predict more false positives for neutral text mentioning specific social groups. Mitigating bias for each task or domain is inefficient, as it requires repetitive model training, data annotation (e.g., demographic information), and evaluation. In pursuit of a more accessible solution, we propose the… Expand
1 Citations

Figures and Tables from this paper

Confronting Abusive Language Online: A Survey from the Ethical and Human Rights Perspective
Several opportunities for rights-respecting, socio-technical solutions to detect and confront online abuse are identified, including ‘nudging’, ‘quarantining‘, value sensitive design, counter-narratives, style transfer, and AI-driven public education applications. Expand


Mitigating Unwanted Biases with Adversarial Learning
This work presents a framework for mitigating biases concerning demographic groups by including a variable for the group of interest and simultaneously learning a predictor and an adversary, which results in accurate predictions that exhibit less evidence of stereotyping Z. Expand
Measuring and Mitigating Unintended Bias in Text Classification
A new approach to measuring and mitigating unintended bias in machine learning models is introduced, using a set of common demographic identity terms as the subset of input features on which to measure bias. Expand
Explicit Inductive Bias for Transfer Learning with Convolutional Networks
This paper investigates several regularization schemes that explicitly promote the similarity of the final solution with the initial model, and eventually recommends a simple $L^2$ penalty with the pre-trained model being a reference as the baseline of penalty for transfer learning tasks. Expand
Data Decisions and Theoretical Implications when Adversarially Learning Fair Representations
An adversarial training procedure is used to remove information about the sensitive attribute from the latent representation learned by a neural network, and the data distribution empirically drives the adversary's notion of fairness. Expand
Adversarially robust transfer learning
This work considers robust transfer learning, in which not only performance but also robustness from a source model to a target domain is transferred, and can improve the generalization of adversarially trained models, while maintaining their robustness. Expand
Domain-Adversarial Training of Neural Networks
A new representation learning approach for domain adaptation, in which data at training and test time come from similar but different distributions, which can be achieved in almost any feed-forward model by augmenting it with few standard layers and a new gradient reversal layer. Expand
Investigating Gender Bias in BERT
This paper focuses on a popular CLM, BERT, and proposes an algorithm that finds fine-grained gender directions, i.e., one primary direction for each BERT layer, which obviates the need of realizing gender subspace in multiple dimensions and prevents other crucial information from being omitted. Expand
Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection
This work presents Iterative Null-space Projection (INLP), a novel method for removing information from neural representations based on repeated training of linear classifiers that predict a certain property the authors aim to remove, followed by projection of the representations on their null-space. Expand
Reducing Gender Bias in Abusive Language Detection
Three mitigation methods, including debiased word embeddings, gender swap data augmentation, and fine-tuning with a larger corpus, can effectively reduce model bias by 90-98% and can be extended to correct model bias in other scenarios. Expand
Transfer of Machine Learning Fairness across Domains
This work offers new theoretical guarantees of improving fairness across domains, and offers a modeling approach to transfer to data-sparse target domains and gives empirical results validating the theory and showing that these modeling approaches can improve fairness metrics with less data. Expand