To Split or not to Split: The Impact of Disparate Treatment in Classification

  title={To Split or not to Split: The Impact of Disparate Treatment in Classification},
  author={Hao Wang and Hsiang Hsu and Mario D{\'i}az and Fl{\'a}vio du Pin Calmon},
  journal={IEEE Transactions on Information Theory},
Disparate treatment occurs when a machine learning model produces different decisions for individuals based on a legally protected or sensitive attribute (e.g., age, sex). In domains where prediction accuracy is paramount, it could potentially be acceptable to fit a model which exhibits disparate treatment. To evaluate the effect of disparate treatment, we compare the performance of split classifiers (i.e., classifiers trained and deployed separately on each group) with group-blind classifiers… 

Figures from this paper

Minimax Pareto Fairness: A Multi Objective Perspective

This work proposes a fairness criterion where a classifier achieves minimax risk and is Pareto-efficient w.r.t. all groups, avoiding unnecessary harm, and can lead to the best zero-gap model if policy dictates so and provides a simple optimization algorithm compatible with deep neural networks to satisfy these constraints.

Aleatoric and Epistemic Discrimination in Classification

The results indicate that state-of-the-art fairness interventions are effective at removing epistemic discrimination, however, when data has missing values, there is still room for improvement in handling aleatoric discrimination.

Quantifying Feature Contributions to Overall Disparity Using Information Theory

When a machine-learning algorithm makes biased decisions, it can be helpful to understand the “sources” of disparity to explain why the bias exists. Towards this, we examine the problem of

Fairness without Imputation: A Decision Tree Approach for Fair Prediction with Missing Values

This paper proposes an integrated approach based on decision trees that does not require a separate process of imputation and learning, and trains a tree with missing incorporated as attribute (MIA), whichdoes not require explicit imputation, and optimize a fairness-regularized objective function.

Responsible and Regulatory Conform Machine Learning for Medicine: A Survey of Challenges and Solutions

This paper surveys the technical and procedural challenges involved in creating medical machine learning systems responsibly and in conformity with existing regulations, as well as possible solutions to address these challenges.

Can Information Flows Suggest Targets for Interventions in Neural Circuits?

It is shown that pruning edges that carry larger information flows about the protected attribute reduces bias at the output to a greater extent, demonstrating that M -information flow can meaningfully suggest targets for interventions.

Optimality and Stability in Federated Learning: A Game-theoretic Approach

This work motivates and proves a notion of optimality given by the average error rates among federating agents (players), and gives the first constant-factor bound on the performance gap between stability and optimality.

How Costly is Noise? Data and Disparities in Consumer Credit

Estimating a structural model of lending with heterogeneity in information and finding that equalizing the precision of credit scores can reduce disparities in approval rates and in credit misallocation for disadvantaged groups by approximately half is found.

Impact of Data Processing on Fairness in Supervised Learning

It is shown that under some mild conditions, pre-processing outperforms post-processing and by appropriate choice of the discrimination measure, the optimization problem for both pre and post processing approaches will reduce to a linear program and hence can be solved efficiently.



Decoupled Classifiers for Group-Fair and Efficient Machine Learning

A simple and efficient decoupling technique is provided, which can be added on top of any black-box machine learning algorithm, to learn different classifiers for different groups.

OpenML: networked science in machine learning

This paper introduces OpenML, a place for machine learning researchers to share and organize data in fine detail, so that they can work more effectively, be more visible, and collaborate with others to tackle harder problems.

Impossibility Theorems for Domain Adaptation

The domain adaptation problem in machine learning occurs when the test data generating distribution differs from the one that generates the training data. It is clear that the success of learning

Data preprocessing techniques for classification without discrimination

This paper surveys and extends existing data preprocessing techniques, being suppression of the sensitive attribute, massaging the dataset by changing class labels, and reweighing or resampling the data to remove discrimination without relabeling instances and presents the results of experiments on real-life data.

Model Projection: Theory and Applications to Fair Machine Learning

The model projection formulation can be directly used to design fair models according to different group fairness metrics and generalizes existing approaches within the fair machine learning literature.

Algorithmic Fairness

An overview of the main concepts of identifying, measuring and improving algorithmic fairness when using AI algorithms is presented and the most commonly used fairness-related datasets in this field are described.

An Information-Theoretic Quantification of Discrimination with Exempt Features

This work proposes a novel information-theoretic decomposition of the total discrimination into a non-exempt component that quantifies the part of the discrimination that cannot be accounted for by the critical features, and an exempt component, which quantifying the remaining discrimination.

Recovering from Biased Data: Can Fairness Constraints Improve Accuracy?

The Equal Opportunity fairness constraint combined with ERM will provably recover the Bayes Optimal Classifier under a range of bias models, and theoretical results provide additional motivation for considering fairness interventions even if an actor cares primarily about accuracy.

Fairness With Minimal Harm: A Pareto-Optimal Approach For Healthcare

This work argues that even in domains where fairness at cost is required, finding a non-unnecessary-harm fairness model is the optimal initial step, and presents a methodology for training neural networks that achieve this goal by dynamically re-balancing subgroups risks.

Estimating Skin Tone and Effects on Classification Performance in Dermatology Datasets

This paper uses individual typology angle (ITA) to approximate skin tone in dermatology datasets and finds no measurable correlation between performance of machine learning model and ITA values, though more comprehensive data is needed for further validation.