A Causal Framework for Discovering and Removing Direct and Indirect Discrimination

  title={A Causal Framework for Discovering and Removing Direct and Indirect Discrimination},
  author={Lu Zhang and Yongkai Wu and Xintao Wu},
In this paper, we investigate the problem of discovering both direct and indirect discrimination from the historical data, and removing the discriminatory effects before the data is used for predictive analysis (e.g., building classifiers). The main drawback of existing methods is that they cannot distinguish the part of influence that is really caused by discrimination from all correlated influences. In our approach, we make use of the causal network to capture the causal structure of the data… 

Figures and Tables from this paper

Anti-discrimination learning: a causal modeling-based framework

  • Lu ZhangXintao Wu
  • Computer Science
    International Journal of Data Science and Analytics
  • 2017
A causal modeling-based framework for anti-discrimination learning is introduced, two works for discovering and preventing both direct and indirect system-level discrimination in the training data, and a work for extending the non-discrimination result from theTraining data to prediction.

On Discrimination Discovery and Removal in Ranked Data using Causal Graph

This paper studies the fairness-aware ranking problem which aims to discover discrimination in ranked datasets and reconstruct the fair ranking, and proposes to map the rank position to a continuous score variable that represents the qualification of the candidates.

Achieving Non-Discrimination in Data Release

The key to discrimination discovery and prevention is to find the meaningful partitions that can be used to provide quantitative evidences for the judgment of discrimination, and a simple criterion for the claim of non-discrimination is developed.

Achieving non-discrimination in prediction

This paper adopts the causal model for modeling the data generation mechanism, and formally defining discrimination in population, in a dataset, and in prediction, and develops a two-phase framework for constructing a discrimination-free classifier with a theoretical guarantee.

Marrying Fairness and Explainability in Supervised Learning

This work formalizes direct discrimination as a direct causal effect of the protected attributes on the decisions, while induced discrimination is formalized as a change in the causal influence of non-protected features associated with theprotected attributes.

Identifying Bias in Data Using Two-Distribution Hypothesis Tests

This work identifies biases in training data with respect to proposed distributions and without the need to train a model, and allows it to return a "closest plausible explanation" for a given dataset, potentially revealing underlying biases in the processes that generated them.

A Causal Approach for Unfair Edge Prioritization and Discrimination Removal

It is proved that cumulative unfairness towards sensitive groups in a decision, like race in a bail decision, is non-existent when edge unfairness is absent, and a toolkit to mitigate unfairness during data generation is provided by the Unfair Edge Prioritization algorithm.

Counterfactual Fairness with Partially Known Causal Graph

Interestingly, it is found that counterfactual fairness can be achieved as if the true causal graph were fully known, when specific background knowledge is provided: the sensitive attributes do not have ancestors in the causal graph.

Fair Data Integration

This work proposes an approach to identify a sub-collection of features that ensure the fairness of the dataset by performing conditional independence tests between different subsets of features and theoretically proves the correctness of the proposed algorithm.

Causal Feature Selection for Algorithmic Fairness

This work proposes an approach to identify a sub-collection of features that ensure fairness of the dataset by performing conditional independence tests between different subsets of features, and theoretically proves the correctness of the proposed algorithm and shows that sublinear conditional independent tests are sufficient to identify these variables.



Situation Testing-Based Discrimination Discovery: A Causal Inference Approach

A general technique to capture discrimination based on the legally grounded situation testing methodology and makes use of the Causal Bayesian Networks and the associated causal inference as a guideline.

Exposing the probabilistic causal structure of discrimination

This paper defines a method to extract, from a dataset of historical decision records, the causal structures existing among the attributes in the data, and develops a type of constrained Bayesian network, which it dubs Suppes-Bayes causal network (SBCN).

Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation

It is found that non-causal feature selection methods cannot be interpreted causally even when they achieve excellent predictivity, so only local causal techniques should be used when insight into causal structure is sought.

A Methodology for Direct and Indirect Discrimination Prevention in Data Mining

This paper discusses how to clean training data sets and outsourced data sets in such a way that direct and/or indirect discriminatory decision rules are converted to legitimate (nondiscriminatory) classification rules and proposes new techniques applicable for direct or indirect discrimination prevention individually or both at the same time.

Handling Conditional Discrimination

This work develops local techniques for handling conditional discrimination when one of the attributes is considered to be explanatory, and demonstrates that the new local techniques remove exactly the bad discrimination, allowing differences in decisions as long as they are explainable.

Auditing black-box models for indirect influence

This paper presents a technique for auditing black-box models, which lets us study the extent to which existing models take advantage of particular features in the data set, without knowing how the models work.

Order-independent constraint-based causal structure learning

This work proposes several modifications of the PC-algorithm (and hence also of the other algorithms) that remove part or all of this order-dependence, and shows that these modifications yield similar performance in low- dimensional settings and improved performance in high-dimensional settings.

Three naive Bayes approaches for discrimination-free classification

Three approaches for making the naive Bayes classifier discrimination-free are presented: modifying the probability of the decision being positive, training one model for every sensitive attribute value and balancing them, and adding a latent variable to the Bayesian model that represents the unbiased label and optimizing the model parameters for likelihood using expectation maximization.

A New Look at Causal Independence

Data mining for discrimination discovery

This article formalizes the processes of direct and indirect discrimination discovery by modelling protected-by-law groups and contexts where discrimination occurs in a classification rule based syntax and proposes two inference models and provides automatic procedures for their implementation.