• Corpus ID: 239998264

Sample Selection for Fair and Robust Training

  title={Sample Selection for Fair and Robust Training},
  author={Yuji Roh and Kangwook Lee and Steven Euijong Whang and Changho Suh},
Fairness and robustness are critical elements of Trustworthy AI that need to be addressed together. Fairness is about learning an unbiased model while robustness is about learning from corrupted data, and it is known that addressing only one of them may have an adverse affect on the other. In this work, we propose a sample selection-based algorithm for fair and robust training. To this end, we formulate a combinatorial optimization problem for the unbiased selection of samples in the presence… 

Figures and Tables from this paper

FLEA: Provably Fair Multisource Learning from Unreliable Training Data
FLEA is introduced, a filtering-based algorithm that allows the learning system to identify and suppress those data sources that would have a negative impact on fairness or accuracy if they were used for training and is proved formally that –given enough data– FLEA protects the learner against unreliable data.
FORML: Learning to Reweight Data for Fairness
This work addresses the challenge of jointly optimizing fairness and predictive performance in the multi-class classification setting by introducing Fairness Optimized Reweighting via Meta-Learning (FORML), a training algorithm that balances fairness constraints and accuracy by jointly optimizing training sample weights and a neural network’s parameters.
Sample Selection with Deadline Control for Efficient Federated Learning on Heterogeneous Clients
This work proposes FedBalancer, a systematic FL framework that actively selects clients’ training samples while respecting privacy and computational capabilities of clients, and introduces an adaptive deadline control scheme that predicts the optimal deadline for each round with varying client train data.
Data Collection and Quality Challenges in Deep Learning: A Data-Centric AI Perspective
This article extends tutorials the authors delivered at the VLDB 2020 [134] and KDD 2021 [82] conferences and study the research landscape for data collection and data quality primarily for deep learning applications.
Breaking Fair Binary Classification with Optimal Flipping Attacks
The minimum amount of data corruption required for a successful data poisoning attack is studied and lower/upper bounds on this quantity are shown to be tight when the target model is the unique unconstrained risk minimizer.
Segmenting across places: The need for fair transfer learning with satellite imagery
The increasing availability of high-resolution satellite imagery has enabled the use of machine learning to support land-cover measurement and inform policy-making. How-ever, labelling satellite


Robust Fairness-aware Learning Under Sample Selection Bias
A framework for robust and fair learning under sample selection bias is proposed and the fairness is achieved under the worst case, which guarantees the model’s fairness on test data during the minimax optimization.
Noise-tolerant fair classification
If one measures fairness using the mean-difference score, and sensitive features are subject to noise from the mutually contaminated learning model, then owing to a simple identity the authors only need to change the desired fairness-tolerance, and the requisite tolerance can be estimated by leveraging existing noise-rate estimators from the label noise literature.
On Adversarial Bias and the Robustness of Fair Machine Learning
The robustness of fair machine learning is analyzed through an empirical evaluation of attacks on multiple algorithms and benchmark datasets, showing that giving the same importance to groups of different sizes and distributions, to counteract the effect of bias in training data, can be in conflict with robustness.
Fairness-Aware Learning from Corrupted Data
It is shown that an adversary can force any learner to return a biased classifier, with or without degrading accuracy, and that the strength of this bias increases for learning problems with underrepresented protected groups in the data.
Learning Fair Representations
We propose a learning algorithm for fair classification that achieves both group fairness (the proportion of members in a protected group receiving positive classification is identical to the
Identifying and Correcting Label Bias in Machine Learning
This paper provides a mathematical formulation of how this bias can arise by assuming the existence of underlying, unknown, and unbiased labels which are overwritten by an agent who intends to provide accurate labels but may have biases against certain groups.
Classification with Noisy Labels by Importance Reweighting
  • Tongliang Liu, D. Tao
  • Computer Science
    IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2016
It is proved that any surrogate loss function can be used for classification with noisy labels by using importance reweighting, with consistency assurance that the label noise does not ultimately hinder the search for the optimal classifier of the noise-free sample.
Leveraging Labeled and Unlabeled Data for Consistent Fair Binary Classification
It is shown that the fair optimal classifier is obtained by recalibrating the Bayes classifier by a group-dependent threshold and the overall procedure is shown to be statistically consistent both in terms of the classification error and fairness measure.
Fairness Without Demographics in Repeated Loss Minimization
This paper develops an approach based on distributionally robust optimization (DRO), which minimizes the worst case risk over all distributions close to the empirical distribution and proves that this approach controls the risk of the minority group at each time step, in the spirit of Rawlsian distributive justice.
Fairness Constraints: Mechanisms for Fair Classification
This paper introduces a flexible mechanism to design fair classifiers by leveraging a novel intuitive measure of decision boundary (un)fairness, and shows on real-world data that this mechanism allows for a fine-grained control on the degree of fairness, often at a small cost in terms of accuracy.