• Corpus ID: 238744320

# The Rich Get Richer: Disparate Impact of Semi-Supervised Learning

@article{Zhu2022TheRG,
title={The Rich Get Richer: Disparate Impact of Semi-Supervised Learning},
author={Zhaowei Zhu and Tianyi Luo and Yang Liu},
journal={ArXiv},
year={2022},
volume={abs/2110.06282}
}
• Published 12 October 2021
• Computer Science
• ArXiv
Semi-supervised learning (SSL) has demonstrated its potential to improve the model accuracy for a variety of learning tasks when the high-quality supervised data is severely limited. Although it is often established that the average accuracy for the entire population of data is improved, it is unclear how SSL fares with different sub-populations. Understanding the above question has substantial fairness implications when different sub-populations are deﬁned by the demographic groups that we aim…
16 Citations

## Figures and Tables from this paper

### Evaluating Fairness Without Sensitive Attributes: A Framework Using Only Auxiliary Models

• Computer Science
ArXiv
• 2022
Inspired by the noisy label learning literature, a closed-form relationship is derived between the directly measured fairness metrics and their corresponding ground-truth metrics and then some key statistics are estimated, which are used, together with the derived relationship, to calibrate the fairness metrics.

### Data Feedback Loops: Model-driven Amplification of Dataset Biases

• Computer Science
• 2022
Experiments in three conditional prediction scenarios demonstrate that models that exhibit a sampling-like behavior are more calibrated and thus more stable, and an intervention is proposed to help calibrate and stabilize unstable feedback systems.

### A N I NFORMATION F USION A PPROACH TO L EARNING WITH I NSTANCE -D EPENDENT L ABEL N OISE

• Computer Science
• 2022
Empirical evaluations demonstrate that the posterior transition matrix (PTM) approach is superior to the state-of-the-art approaches, achieves more stable training for instance-dependent label noise, and achieves statistically consistent classifiers.

### Teacher Guided Training: An Efficient Framework for Knowledge Transfer

• Computer Science
ArXiv
• 2022
This paper proposes the teacher-guided training (TGT) framework for training a high-quality compact model that leverages the knowledge acquired by pretrained generative models, while obviating the need to go through a large volume of data.

### Understanding Instance-Level Impact of Fairness Constraints

• Computer Science
ICML
• 2022
It is demonstrated with extensive experiments that training on a subset of weighty data examples leads to lower fairness violations with a trade-off of accuracy.

### Transferring Fairness under Distribution Shifts via Fair Consistency Regularization

• Computer Science
ArXiv
• 2022
This paper studies how to transfer model fairness under distribution shifts, a widespread issue in practice, and proposes a practical algorithm with a fair consistency regularization as the key component.

### Input-agnostic Certified Group Fairness via Gaussian Parameter Smoothing

• Computer Science
ICML
• 2022
An input-agnostic certiﬁed group fairness algorithm, F AIR S MOOTH, is proposed for improving the fairness of classi ﬁcation models while maintaining the remarkable prediction accuracy.

### To Aggregate or Not? Learning with Separate Noisy Labels

• Computer Science
ArXiv
• 2022
The theorems conclude that label separation is preferred over label aggregation when the noise rates are high, or the number of labelers/annotations is insufﬁcient.

### Pruning has a disparate impact on model accuracy

• Computer Science
ArXiv
• 2022
Light is shed on the factors to cause disparities in gradient norms and distance to decision boundary across groups to be responsible for this critical issue, and a simple solution is proposed that mitigates the disparate impacts caused by pruning.

### Don’t Throw it Away! The Utility of Unlabeled Data in Fair Decision Making

• Computer Science
FAccT
• 2022
A novel method based on a variational autoencoder for practical fair decision-making that learns an unbiased data representation leveraging both labeled and unlabeled data and uses the representations to learn a policy in an online process and empirically validate that it converges to the optimal policy according to the ground-truth with low variance.

## References

SHOWING 1-10 OF 84 REFERENCES

### MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification

• Computer Science
ACL
• 2020
By mixing labeled, unlabeled and augmented data, MixText significantly outperformed current pre-trained and fined-tuned models and other state-of-the-art semi-supervised learning methods on several text classification benchmarks.

### Unsupervised Data Augmentation

• Computer Science
ArXiv
• 2019
UDA has a small twist in that it makes use of harder and more realistic noise generated by state-of-the-art data augmentation methods, which leads to substantial improvements on six language tasks and three vision tasks even when the labeled set is extremely small.

### Character-level Convolutional Networks for Text Classification

• Computer Science
NIPS
• 2015
This article constructed several large-scale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results in text classification.

• 2018

### ReMixMatch: Semi-Supervised Learning with Distribution Alignment and Augmentation Anchoring

• Computer Science
ArXiv
• 2019
A variant of AutoAugment which learns the augmentation policy while the model is being trained, and is significantly more data-efficient than prior work, requiring between $5\times and$16\times less data to reach the same accuracy.

### MixMatch: A Holistic Approach to Semi-Supervised Learning

• Computer Science
NeurIPS
• 2019
This work unify the current dominant approaches for semi-supervised learning to produce a new algorithm, MixMatch, that works by guessing low-entropy labels for data-augmented unlabeled examples and mixing labeled and unlabeling data using MixUp.

### Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning

• Computer Science
IEEE Transactions on Pattern Analysis and Machine Intelligence
• 2019
A new regularization method based on virtual adversarial loss: a new measure of local smoothness of the conditional label distribution given input that achieves state-of-the-art performance for semi-supervised learning tasks on SVHN and CIFAR-10.

### Understanding Instance-Level Label Noise: Disparate Impacts and Treatments

The harms caused by memorizing noisy instances are quantified, the disparate impacts of noisy labels for sample instances with different representation frequencies are shown, and new understandings for when these approaches work are revealed.

### FMP: Toward Fair Graph Message Passing against Topology Bias

• Computer Science
ArXiv
• 2022
A Fair Message Passing (FMP) scheme is proposed to aggregate useful information from neighbors but minimize the effect of topology bias in a unified framework considering graph smoothness and fairness objectives.