• Corpus ID: 245650538

Fair Data Representation for Machine Learning at the Pareto Frontier

  title={Fair Data Representation for Machine Learning at the Pareto Frontier},
  author={Shizhou Xu and Thomas Strohmer},
As machine learning powered decision making is playing an increasingly important role in our daily lives, it is imperative to strive for fairness of the underlying data processing and algorithms. We propose a pre-processing algorithm for fair data representation via which L 2 objective supervised learning algorithms result in an estimation of the Pareto frontier between prediction error and statistical disparity. In particular, the present work applies the optimal positive definite affine… 

Figures from this paper


Learning Fair Representations
We propose a learning algorithm for fair classification that achieves both group fairness (the proportion of members in a protected group receiving positive classification is identical to the
Equality of Opportunity in Supervised Learning
This work proposes a criterion for discrimination against a specified sensitive attribute in supervised learning, where the goal is to predict some target based on available features and shows how to optimally adjust any learned predictor so as to remove discrimination according to this definition.
Data preprocessing techniques for classification without discrimination
This paper surveys and extends existing data preprocessing techniques, being suppression of the sensitive attribute, massaging the dataset by changing class labels, and reweighing or resampling the data to remove discrimination without relabeling instances and presents the results of experiments on real-life data.
Why Unbiased Computational Processes Can Lead to Discriminative Decision Procedures
This chapter discusses the implicit modeling assumptions made by most data mining algorithms and shows situations in which they are not satisfied and outlines three realistic scenarios in which an unbiased process can lead to discriminatory models.
Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment
A new notion of unfairness, disparate mistreatment, is introduced, defined in terms of misclassification rates, which is proposed for decision boundary-based classifiers and can be easily incorporated into their formulation as convex-concave constraints.
Explanation of Variability and Removal of Confounding Factors from Data through Optimal Transport
A methodology based on the theory of optimal transport is developed to attribute variability in data sets to known and unknown factors and to remove such attributable components of the variability
Optimized Pre-Processing for Discrimination Prevention
This paper proposes a convex optimization for learning a data transformation with three goals: controlling discrimination, limiting distortion in individual data samples, and preserving utility, and describes the impact of limited sample size in accomplishing this objective.
Wasserstein Fair Classification
An approach to fair classification is proposed that enforces independence between the classifier outputs and sensitive information by minimizing Wasserstein-1 distances and is robust to specific choices of the threshold used to obtain class predictions from model outputs.
A Convex Framework for Fair Regression
By varying the weight on the fairness regularizer, this work can compute the efficient frontier of the accuracy-fairness trade-off on any given dataset, and measure the severity of this trade-offs via a numerical quantity the authors call the Price of Fairness.
Projection to Fairness in Statistical Learning.
The methodology leverages tools from optimal transport to construct efficiently the projection to fairness of any given estimator as a simple post-processing step, and precisely quantifies the cost of fairness, measured in terms of prediction accuracy.