• Corpus ID: 24990444

Data Decisions and Theoretical Implications when Adversarially Learning Fair Representations

  title={Data Decisions and Theoretical Implications when Adversarially Learning Fair Representations},
  author={Alex Beutel and Jilin Chen and Zhe Zhao and Ed H. Chi},
How can we learn a classifier that is "fair" for a protected or sensitive group, when we do not know if the input to the classifier belongs to the protected group. [] Key MethodHere, we use an adversarial training procedure to remove information about the sensitive attribute from the latent representation learned by a neural network. In particular, we study how the choice of data for the adversarial training effects the resulting fairness properties. We find two interesting results: a small amount of data is…

Figures from this paper

Costs and Benefits of Fair Representation Learning

It is shown that using fair representation learning as an intermediate step in fair classification incurs a cost compared to directly solving the problem, which is referred to as the cost of mistrust, and the benefits ofFair representation learning are quantified, by showing that any subsequent use of the cleaned data will not be too unfair.

Mitigating Unwanted Biases with Adversarial Learning

This work presents a framework for mitigating biases concerning demographic groups by including a variable for the group of interest and simultaneously learning a predictor and an adversary, which results in accurate predictions that exhibit less evidence of stereotyping Z.

Towards Fair Classifiers Without Sensitive Attributes: Exploring Biases in Related Features

A novel framework is proposed which simultaneously uses these related features for accurate prediction and enforces fairness and can dynamically adjust the regularization weight of each related feature to balance its contribution on model classification and fairness.

Learning Fair Models without Sensitive Attributes: A Generative Approach

A probabilistic generative framework is proposed to effectively estimate the sensitive attribute from the training data with relevant features in various formats and utilize the estimated sensitive attribute information to learn fair models.

Transfer of Machine Learning Fairness across Domains

This work offers new theoretical guarantees of improving fairness across domains, and offers a modeling approach to transfer to data-sparse target domains and gives empirical results validating the theory and showing that these modeling approaches can improve fairness metrics with less data.

Learning Fair Representations via an Adversarial Framework

A minimax adversarial framework with a generator to capture the data distribution and generate latent representations, and a critic to ensure that the distributions across different protected groups are similar provides a theoretical guarantee with respect to statistical parity and individual fairness.

Imparting Fairness to Pre-Trained Biased Representations

This paper first studies the "linear" form of the adversarial representation learning problem, and obtains an exact closed-form expression for its global optima through spectral learning and extends this solution and analysis to non-linear functions through kernel representation.

Inherent Tradeoffs in Learning Fair Representations

This paper provides the first result that quantitatively characterizes the tradeoff between demographic parity and the joint utility across different population groups and proves that if the optimal decision functions across different groups are close, then learning fair representations leads to an alternative notion of fairness, known as the accuracy parity.

You Can Still Achieve Fairness Without Sensitive Attributes: Exploring Biases in Non-Sensitive Features

A novel framework which simultaneously uses these related features for accurate prediction and regularizing the model to be fair is proposed, and the model can dynamically adjust the importance weight of each related feature to balance the contribution of the feature on model classification and fairness.

Discovering Fair Representations in the Data Domain

This work proposes to cast the problem ofpretability and fairness in computer vision and machine learning applications as data-to-data translation, i.e. learning a mapping from an input domain to a fair target domain, where a fairness definition is being enforced.



Censoring Representations with an Adversary

This work forms the adversarial model as a minimax problem, and optimize that minimax objective using a stochastic gradient alternate min-max optimizer, and demonstrates the ability to provide discriminant free representations for standard test problems, and compares with previous state of the art methods for fairness.

Learning Fair Representations

We propose a learning algorithm for fair classification that achieves both group fairness (the proportion of members in a protected group receiving positive classification is identical to the

Equality of Opportunity in Supervised Learning

This work proposes a criterion for discrimination against a specified sensitive attribute in supervised learning, where the goal is to predict some target based on available features and shows how to optimally adjust any learned predictor so as to remove discrimination according to this definition.

Domain-Adversarial Training of Neural Networks

A new representation learning approach for domain adaptation, in which data at training and test time come from similar but different distributions, which can be achieved in almost any feed-forward model by augmenting it with few standard layers and a new gradient reversal layer.

Beyond Globally Optimal: Focused Learning for Improved Recommendations

This work offers a new technique called focused learning, based on hyperparameter optimization and a customized matrix factorization objective, which demonstrates prediction accuracy improvements on multiple datasets.

The Variational Fair Autoencoder

This model is based on a variational autoencoding architecture with priors that encourage independence between sensitive and latent factors of variation that is more effective than previous work in removing unwanted sources of variation while maintaining informative latent representations.

Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

This work empirically demonstrates that its algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks.

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.

Inherent Trade-Offs in the Fair Determination of Risk Scores

Some of the ways in which key notions of fairness are incompatible with each other are suggested, and hence a framework for thinking about the trade-offs between them is provided.

Domain Separation Networks

The novel architecture results in a model that outperforms the state-of-the-art on a range of unsupervised domain adaptation scenarios and additionally produces visualizations of the private and shared representations enabling interpretation of the domain adaptation process.