More Data Can Lead Us Astray: Active Data Acquisition in the Presence of Label Bias

  title={More Data Can Lead Us Astray: Active Data Acquisition in the Presence of Label Bias},
  author={Yunyi Li and Maria De-Arteaga and Maytal Saar-Tsechansky},
An increased awareness concerning risks of algorithmic bias has driven a surge of efforts around bias mitigation strategies. A vast majority of the proposed approaches fall under one of two categories: (1) imposing algorithmic fairness constraints on predictive models, and (2) collecting additional training samples. Most recently and at the intersection of these two categories, methods that propose active learning under fairness constraints have been developed. However, proposed bias mitigation… 

Figures from this paper



Identifying and Correcting Label Bias in Machine Learning

This paper provides a mathematical formulation of how this bias can arise by assuming the existence of underlying, unknown, and unbiased labels which are overwritten by an agent who intends to provide accurate labels but may have biases against certain groups.

Social Norm Bias: Residual Harms of Fairness-Aware Algorithms

This work characterize Social Norm Bias (SNoB), a subtle but consequen-tial type of algorithmic discrimination that may be exhibited by machine learning models, even when these systems achieve group fairness objectives, by measuring how an algorithm’s predictions are associated with conformity to inferred gender norms.

Fairer machine learning in the real world: Mitigating discrimination without collecting sensitive data

Three potential approaches to deal with knowledge and information deficits in fairness issues that are emergent properties of complex sociotechnical systems are presented and discussed.

Fairness Constraints: Mechanisms for Fair Classification

This paper introduces a flexible mechanism to design fair classifiers by leveraging a novel intuitive measure of decision boundary (un)fairness, and shows on real-world data that this mechanism allows for a fine-grained control on the degree of fairness, often at a small cost in terms of accuracy.

AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias

A new open source Python toolkit for algorithmic fairness, AI Fairness 360 (AIF360), released under an Apache v2.0 license to help facilitate the transition of fairness research algorithms to use in an industrial setting and to provide a common framework for fairness researchers to share and evaluate algorithms.

A statistical framework for fair predictive algorithms

A method to remove bias from predictive models by removing all information regarding protected variables from the permitted training data is proposed and is general enough to accommodate arbitrary data types, e.g. binary, continuous, etc.

Fairness Evaluation in Presence of Biased Noisy Labels

This work proposes a sensitivity analysis framework for assessing how assumptions on the noise across groups affect the predictive bias properties of the risk assessment model as a predictor of reoffense.

Rehumanized Crowdsourcing: A Labeling Framework Addressing Bias and Ethics in Machine Learning

A framework that allocates microtasks considering human factors of workers such as demographics and compensation is proposed, which mitigates biases in the contributor sample and increases the hourly pay given to contributors.

Obtaining Fairness using Optimal Transport Theory

The goals of this paper are to detect when a binary classification rule lacks fairness and to try to fight against the potential discrimination attributable to it by modifying either the classifiers or the data itself.

Adaptive Sampling to Reduce Disparate Performance

This work considers a setting where data collection and optimization are performed simultaneously and proposes to consistently follow this strategy throughout the whole training process and to guide the resulting classifier towards equal performance on the different groups by adaptively sampling each data point from the group that is currently disadvantaged.