• Corpus ID: 246634065

Fair Interpretable Representation Learning with Correction Vectors

  title={Fair Interpretable Representation Learning with Correction Vectors},
  author={Mattia Cerrato and A Coronel and Marius K{\"o}ppel and Alexander Segner and Roberto Esposito and Stefan Kramer},
Neural network architectures have been extensively employed in the fair representation learning setting, where the objective is to learn a new representation for a given vector which is independent of sensitive information. Various representation debiasing techniques have been proposed in the literature. However, as neural networks are inherently opaque, these methods are hard to comprehend, which limits their usefulness. We propose a new framework for fair representation learning that is… 

Invariant Representations with Stochastically Quantized Neural Networks

This paper employs stochastically-activated binary neural networks and compute (not bound) the mutual information between a layer and a sensitive attribute and use this information as a regularization factor during gradient descent and shows that the learned representations display a higher level of invariance compared to full-precision neural networks.

Bias Mitigation for Machine Learning Classifiers: A Comprehensive Survey

This paper investigates how existing bias mitigation methods are evaluated in the literature, and considers datasets, metrics and benchmarking to support practitioners in making informed choices when developing and evaluating new bias mitigation Methods.

Constraining deep representations with a noise module for fair classification

This paper builds onto a domain adaptation neural model by augmenting it with a "noise conditioning" mechanism which is instrumental in obtaining fair representations and provides experiments showing the effectiveness of the noise conditioning mechanism in helping the networks to ignore the sensible attribute.


This model is based on a variational autoencoding architecture with priors that encourage independence between sensitive and latent factors of variation with an additional penalty term based on the “Maximum Mean Discrepancy” (MMD) measure.

Deep Residual Learning for Image Recognition

This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

Domain-Adversarial Training of Neural Networks

A new representation learning approach for domain adaptation, in which data at training and test time come from similar but different distributions, which can be achieved in almost any feed-forward model by augmenting it with few standard layers and a new gradient reversal layer.

Controllable Invariance through Adversarial Feature Learning

This paper shows that the proposed framework induces an invariant representation, and leads to better generalization evidenced by the improved performance on three benchmark tasks.

NICE: Non-linear Independent Components Estimation

We propose a deep learning framework for modeling complex high-dimensional densities called Non-linear Independent Component Estimation (NICE). It is based on the idea that a good representation is

Learning Fair Representations

We propose a learning algorithm for fair classification that achieves both group fairness (the proportion of members in a protected group receiving positive classification is identical to the

Fair pairwise learning to rank

A family of fair pairwise learning to rank approaches based on Neural Networks is presented, which are able to produce balanced outcomes for underprivileged groups and, at the same time, build fair representations of data, i.e. new vectors having no correlation with regard to a sensitive attribute.

Invariant Representations without Adversarial Training

It is shown that adversarial training is unnecessary and sometimes counter-productive; this work casts invariant representation learning as a single information-theoretic objective that can be directly optimized.

Deep Domain Confusion: Maximizing for Domain Invariance

This work proposes a new CNN architecture which introduces an adaptation layer and an additional domain confusion loss, to learn a representation that is both semantically meaningful and domain invariant and shows that a domain confusion metric can be used for model selection to determine the dimension of an adaptationlayer and the best position for the layer in the CNN architecture.