Debiased Learning from Naturally Imbalanced Pseudo-Labels

  title={Debiased Learning from Naturally Imbalanced Pseudo-Labels},
  author={Xudong Wang and Zhi-Li Wu and Long Lian and Stella X. Yu},
  journal={2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  • Xudong WangZhi-Li Wu Stella X. Yu
  • Published 5 January 2022
  • Computer Science
  • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Pseudo-labels are confident predictions made on unlabeled target data by a classifier trained on labeled source data. They are widely used for adapting a model to unlabeled data, e.g., in a semi-supervised learning setting. Our key insight is that pseudo-labels are naturally imbalanced due to intrinsic data similarity, even when a model is trained on balanced source data and evaluated on balanced target data. If we address this previously unknown imbalanced classification problem arising from… 

An Embarrassingly Simple Baseline for Imbalanced Semi-Supervised Learning

This paper studies a simple yet overlooked baseline – SimiS – which tackles data imbalance by simply supplementing labeled data with pseudo-labels, according to the difference in class distribution from the most frequent class, and shows great robustness to a wide range of data distributions.

Unsupervised Selective Labeling for More Effective Semi-supervised Learning

This work sets a new standard for practical and efficient SSL by selecting cluster prototypes, either in a pretrained feature space, or along with feature optimization, both without labels, which consistently improves SSL methods over state-of-the-art active learning given labeled data.

Training from a Better Start Point: Active Self-Semi-Supervised Learning for Few Labeled Samples

An active self-semi- supervised learning (AS3L) framework is proposed that can improve the performance of models in the case of few annotations while reducing the training time and it is illustrated that the accuracy of PPL is not only affected by the quality of features, but also by the selection of the labeled samples.

MultiMatch: Multi-task Learning for Semi-supervised Domain Generalization

This paper proposes MultiMatch, i.e. extending FixMatch to the multi-task learning framework, producing the high-quality pseudo- label for SSDG, and outperforms the existing semi-supervised methods and the SSDG method on several benchmark DG datasets.

Learning with an Evolving Class Ontology

This paper formalizes a protocol for studying the problem of Learning with Evolving Class Ontology (LECO), and demonstrates that such strategies can surprisingly be made near-optimal, in the sense of approaching an “oracle” that.



CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

The proposed Class-Rebalancing Self-Training (CReST), a simple yet effective framework to improve existing semi-supervised learning methods on class-imbalanced data, and a progressive distribution alignment to adaptively adjust the rebalancing strength dubbed CReST+.

Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss

A theoretically-principled label-distribution-aware margin (LDAM) loss motivated by minimizing a margin-based generalization bound is proposed that replaces the standard cross-entropy objective during training and can be applied with prior strategies for training with class-imbalance such as re-weighting or re-sampling.

Unsupervised Data Augmentation for Consistency Training

A new perspective on how to effectively noise unlabeled examples is presented and it is argued that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning.

FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence

This paper demonstrates the power of a simple combination of two common SSL methods: consistency regularization and pseudo-labeling, and shows that FixMatch achieves state-of-the-art performance across a variety of standard semi-supervised learning benchmarks.

CoMatch: Semi-supervised Learning with Contrastive Graph Regularization

CoMatch is a new semi-supervised learning method that unifies dominant approaches and addresses their limitations, and achieves substantial accuracy improvements on the label-scarce CIFAR-10 and STL-10.

Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples

Despite the simplicity of the approach, PAWS outperforms other semi-supervised methods across architectures, setting a new state-of-the-art for a ResNet-50 on ImageNet trained with either 10% or 1% of the labels, reaching 75% and 66.5% top-1 respectively.

Big Self-Supervised Models are Strong Semi-Supervised Learners

The proposed semi-supervised learning algorithm can be summarized in three steps: unsupervised pretraining of a big ResNet model using SimCLRv2 (a modification of SimCLRs), supervised fine-tuning on a few labeled examples, and distillation with unlabeled examples for refining and transferring the task-specific knowledge.

MixMatch: A Holistic Approach to Semi-Supervised Learning

This work unify the current dominant approaches for semi-supervised learning to produce a new algorithm, MixMatch, that works by guessing low-entropy labels for data-augmented unlabeled examples and mixing labeled and unlabeling data using MixUp.

Data-Centric Semi-Supervised Learning

This work demonstrates that a small compute spent on careful labeled data selection brings big annotation efficiency and model performance gain without changing the learning pipeline.

Deep Metric Transfer for Label Propagation with Limited Annotated Data

This paper proposes a generic framework that utilize unlabeled data to aid generalization for all three tasks of object recognition and shows such a label propagation scheme can be highly effective when the similarity metric used for propagation is transferred from other related domains.