• Corpus ID: 186206648

Leveraging Labeled and Unlabeled Data for Consistent Fair Binary Classification

  title={Leveraging Labeled and Unlabeled Data for Consistent Fair Binary Classification},
  author={Evgenii Chzhen and Christophe Denis and Mohamed Hebiri and L. Oneto and Massimiliano Pontil},
We study the problem of fair binary classification using the notion of Equal Opportunity. It requires the true positive rate to distribute equally across the sensitive groups. Within this setting we show that the fair optimal classifier is obtained by recalibrating the Bayes classifier by a group-dependent threshold. We provide a constructive expression for the threshold. This result motivates us to devise a plug-in classification procedure based on both unlabeled and labeled datasets. While… 

Figures and Tables from this paper

Fairness in Semi-Supervised Learning: Unlabeled Data Help to Reduce Discrimination

A framework of fair semi-supervised learning in the pre-processing phase, including pseudo labeling to predict labels for unlabeled data, a re-sampling method to obtain multiple fair datasets and lastly, ensemble learning to improve accuracy and decrease discrimination are presented.

Fairness-aware Model-agnostic Positive and Unlabeled Learning

A fairness-aware PUL method named FairPUL, based on the analysis of the optimal fair classifier for PUL, and a model-agnostic post-processing framework, leveraging both the positive examples and unlabeled ones that is proven to be statistically consistent in terms of both the classification error and the fairness metric.

Fairness guarantee in multi-class classification

The enhanced estimator is proved to mimic the behavior of the optimal rule both in terms of fairness and risk and is competitive with the state-of-the-art in-processing fairlearn in the multi-class classification setting.

Optimized Score Transformation for Consistent Fair Classification

This paper formulate the problem of transforming scores to satisfy fairness constraints that are linear in conditional means of scores while minimizing a cross-entropy objective, and proposes a method called FairScoreTransformer to approach this solution using a combination of standard probabilistic classifiers and ADMM.

Optimized Score Transformation for Fair Classification

Comprehensive experiments show that the proposed FairScoreTransformer has advantages for score-based metrics such as Brier score and AUC while remaining competitive for binary label-based metric such as accuracy.

Addressing Fairness in Classification with a Model-Agnostic Multi-Objective Algorithm

A differentiable relaxation that approximates fairness notions provably better than existing relaxations is defined and a model-agnostic multi-objective architecture that can simultaneously optimize for multiple fairness notions and multiple sensitive attributes and supports all statistical parity-based notions of fairness is proposed.

Fair regression via plug-in estimator and recalibration with statistical guarantees

This work studies the problem of learning an optimal regression function subject to a fairness constraint by leveraging on a proxy-discretized version, for which an explicit expression of the optimal fair predictor is derived.

Bayes-Optimal Classifiers under Group Fairness

A group-based thresholding method, called FairBayes 1, is proposed that can directly control disparity, and achieve an essentially optimal fairness-accuracy tradeoff.

Fairness Constraints in Semi-supervised Learning

A framework for fair semi-supervised learning is developed, which includes classifier loss to optimize accuracy, label propagation loss to optimized unlabled data prediction, and fairness constraints over labeled and unlabeled data to optimize the fairness level.



Taking Advantage of Multitask Learning for Fair Classification

This paper proposes to use Multitask Learning (MTL), enhanced with fairness constraints, to jointly learn group specific classifiers that leverage information between sensitive groups and proposes a three-pronged approach to tackle fairness, by increasing accuracy on each group, enforcing measures of fairness during training, and protecting sensitive information during testing.

Learning Fair Representations

We propose a learning algorithm for fair classification that achieves both group fairness (the proportion of members in a protected group receiving positive classification is identical to the

Confidence Sets with Expected Sizes for Multiclass Classification

A general device that, given an unlabeled dataset and a score function defined as the minimizer of some empirical and convex risk, outputs a set of class labels, instead of a single one, which provides significant improvement of the classification risk.

On Fairness and Calibration

It is shown that calibration is compatible only with a single error constraint, and that any algorithm that satisfies this relaxation is no better than randomizing a percentage of predictions for an existing classifier.

The cost of fairness in binary classification

This work relates two existing fairness measures to cost-sensitive risks, and shows that for such costsensitive fairness measures, the optimal classifier is an instance-dependent thresholding of the class-probability function.

Least Ambiguous Set-Valued Classifiers With Bounded Error Levels

This work introduces a framework for multiclass set-valued classification, where the classifiers guarantee user-defined levels of coverage or confidence while minimize the ambiguity while minimizing the ambiguity (the expected size of the output).

Equality of Opportunity in Supervised Learning

This work proposes a criterion for discrimination against a specified sensitive attribute in supervised learning, where the goal is to predict some target based on available features and shows how to optimally adjust any learned predictor so as to remove discrimination according to this definition.

Data preprocessing techniques for classification without discrimination

This paper surveys and extends existing data preprocessing techniques, being suppression of the sensitive attribute, massaging the dataset by changing class labels, and reweighing or resampling the data to remove discrimination without relabeling instances and presents the results of experiments on real-life data.

Training Well-Generalizing Classifiers for Fairness Metrics and Other Data-Dependent Constraints

This work frames the problem as a two-player game where one player optimizes the model parameters on a training dataset, and the other player enforces the constraints on an independent validation dataset, to improve generalization performance.

Consistent Multilabel Classification

This work shows that for multilabel metrics constructed as instance-, micro- and macro-averages, the population optimal classifier can be decomposed into binary classifiers based on the marginal instance-conditional distribution of each label, with a weak association between labels via the threshold.