Fix-A-Step: Effective Semi-supervised Learning from Uncurated Unlabeled Sets

@article{Huang2022FixAStepES,
  title={Fix-A-Step: Effective Semi-supervised Learning from Uncurated Unlabeled Sets},
  author={Zhe Huang and Mary-Joy Sidhom and Benjamin S. Wessler and Michael C. Hughes},
  journal={ArXiv},
  year={2022},
  volume={abs/2208.11870}
}
Semi-supervised learning (SSL) promises gains in accuracy compared to training classifiers on small labeled datasets by also training on many unlabeled images. In realistic applica- tions like medical imaging, unlabeled sets will be collected for expediency and thus uncurated : possibly different from the labeled set in represented classes or class frequencies. Un-fortunately, modern deep SSL often makes accuracy worse when given uncurated unlabeled sets. Recent remedies suggest filtering… 

References

SHOWING 1-10 OF 52 REFERENCES

DP-SSL: Towards Robust Semi-supervised Learning with A Few Labeled Samples

A new SSL method called DP-SSL is proposed that adopts an innovative data programming (DP) scheme to generate probabilistic labels for unlabeled data and achieves better classification performance on test sets than existing SSL methods, especially when only a small number of labeled samples are available.

Realistic Evaluation of Deep Semi-Supervised Learning Algorithms

This work creates a unified reimplemention and evaluation platform of various widely-used SSL techniques and finds that the performance of simple baselines which do not use unlabeled data is often underreported, that SSL methods differ in sensitivity to the amount of labeled and unlabeling data, and that performance can degrade substantially when the unlabelED dataset contains out-of-class examples.

Semi-Supervised Learning under Class Distribution Mismatch

This work addresses under-studied and realistic SSL problem by a novel algorithm named Uncertainty-Aware Self-Distillation (UASD), which produces soft targets that avoid catastrophic error propagation, and empower learning effectively from unconstrained unlabelled data with out-of-distribution (OOD) samples.

An Empirical Study and Analysis on Open-Set Semi-Supervised Learning

Style Disturbance is proposed to improve traditional SSL methods on open-set SSL and experimentally shows the approach can achieve state-of-the-art results on various datasets by utilizing OOD samples properly.

Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data

A simple and effective safe deep SSL method to alleviate the harm caused by class distribution mismatch and it is theoretically guaranteed that its generalization approaches the optimal in the order O( √ d ln(n)/n), even faster than the convergence rate in supervised learning associated with massive parameters.

FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence

This paper demonstrates the power of a simple combination of two common SSL methods: consistency regularization and pseudo-labeling, and shows that FixMatch achieves state-of-the-art performance across a variety of standard semi-supervised learning benchmarks.

Open-World Semi-Supervised Learning

Despite solving the harder task ORCA outperforms semisupervised methods on seen classes, as well as novel class discovery methods on novel classes, achieving 7% and 151% improvements on seen and novel classes in the ImageNet dataset.

OpenMatch: Open-set Consistency Regularization for Semi-supervised Learning with Outliers

OpenMatch achieves state-of-the-art performance on three datasets, and even outperforms a fully supervised model in detecting outliers unseen in unlabeled data on CIFAR10.

MixMatch: A Holistic Approach to Semi-Supervised Learning

This work unify the current dominant approaches for semi-supervised learning to produce a new algorithm, MixMatch, that works by guessing low-entropy labels for data-augmented unlabeled examples and mixing labeled and unlabeling data using MixUp.

Universal Semi-Supervised Learning

A proposed CAFA framework which requires no prior knowledge of the class relationship between the labeled dataset and unlabeled dataset and conducts domain adaptation to fully exploit the value of the detected class-sharing data for better semi-supervised consistency training.
...