• Corpus ID: 239998264

Sample Selection for Fair and Robust Training

  title={Sample Selection for Fair and Robust Training},
  author={Yuji Roh and Kangwook Lee and Steven Euijong Whang and Changho Suh},
Fairness and robustness are critical elements of Trustworthy AI that need to be addressed together. Fairness is about learning an unbiased model while robustness is about learning from corrupted data, and it is known that addressing only one of them may have an adverse affect on the other. In this work, we propose a sample selection-based algorithm for fair and robust training. To this end, we formulate a combinatorial optimization problem for the unbiased selection of samples in the presence… 

Figures and Tables from this paper

Sample Selection with Deadline Control for Efficient Federated Learning on Heterogeneous Clients
  • Jaemin Shin, Yuanchun Li, Yunxin Liu, Sung-Ju Lee
  • Computer Science
  • 2022
This work proposes FedBalancer, a systematic FL framework that actively selects clients’ training samples while respecting privacy and computational capabilities of clients, and introduces an adaptive deadline control scheme that predicts the optimal deadline for each round with varying client train data.
Data Collection and Quality Challenges in Deep Learning: A Data-Centric AI Perspective
This article extends tutorials the authors delivered at the VLDB 2020 [134] and KDD 2021 [82] conferences and study the research landscape for data collection and data quality primarily for deep learning applications.


Robust Fairness-aware Learning Under Sample Selection Bias
A framework for robust and fair learning under sample selection bias is proposed and the fairness is achieved under the worst case, which guarantees the model’s fairness on test data during the minimax optimization.
Noise-tolerant fair classification
If one measures fairness using the mean-difference score, and sensitive features are subject to noise from the mutually contaminated learning model, then owing to a simple identity the authors only need to change the desired fairness-tolerance, and the requisite tolerance can be estimated by leveraging existing noise-rate estimators from the label noise literature.
On Adversarial Bias and the Robustness of Fair Machine Learning
The robustness of fair machine learning is analyzed through an empirical evaluation of attacks on multiple algorithms and benchmark datasets, showing that giving the same importance to groups of different sizes and distributions, to counteract the effect of bias in training data, can be in conflict with robustness.
FairBatch: Batch Selection for Model Fairness
The batch selection algorithm, which the authors call FairBatch, implements this optimization and supports prominent fairness measures: equal opportunity, equalized odds, and demographic parity and is compatible with existing batch selection techniques intended for different purposes, thus gracefully achieving multiple purposes.
Fairness-Aware Learning from Corrupted Data
It is shown that an adversary can force any learner to return a biased classifier, with or without degrading accuracy, and that the strength of this bias increases for learning problems with underrepresented protected groups in the data.
Learning Fair Representations
We propose a learning algorithm for fair classification that achieves both group fairness (the proportion of members in a protected group receiving positive classification is identical to the
Identifying and Correcting Label Bias in Machine Learning
This paper provides a mathematical formulation of how this bias can arise by assuming the existence of underlying, unknown, and unbiased labels which are overwritten by an agent who intends to provide accurate labels but may have biases against certain groups.
Classification with Noisy Labels by Importance Reweighting
  • Tongliang Liu, D. Tao
  • Computer Science, Mathematics
    IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2016
It is proved that any surrogate loss function can be used for classification with noisy labels by using importance reweighting, with consistency assurance that the label noise does not ultimately hinder the search for the optimal classifier of the noise-free sample.
Leveraging Labeled and Unlabeled Data for Consistent Fair Binary Classification
It is shown that the fair optimal classifier is obtained by recalibrating the Bayes classifier by a group-dependent threshold and the overall procedure is shown to be statistically consistent both in terms of the classification error and fairness measure.
Fairness Without Demographics in Repeated Loss Minimization
This paper develops an approach based on distributionally robust optimization (DRO), which minimizes the worst case risk over all distributions close to the empirical distribution and proves that this approach controls the risk of the minority group at each time step, in the spirit of Rawlsian distributive justice.