• Corpus ID: 235825939

Dash: Semi-Supervised Learning with Dynamic Thresholding

  title={Dash: Semi-Supervised Learning with Dynamic Thresholding},
  author={Yi Xu and Lei Shang and Jinxing Ye and Qi Qian and Yu-Feng Li and Baigui Sun and Hao Li and Rong Jin},
While semi-supervised learning (SSL) has received tremendous attentions in many machine learning tasks due to its successful use of unlabeled data, existing SSL algorithms use either all unlabeled examples or the unlabeled examples with a fixed high-confidence prediction during the training progress. However, it is possible that too many correct/wrong pseudo labeled examples are eliminated/selected. In this work we develop a simple yet powerful framework, whose key idea is to select a subset of… 

Figures and Tables from this paper

DoubleMatch: Improving Semi-Supervised Learning with Self-Supervision

This work proposes a new SSL algorithm, DoubleMatch, which combines the pseudo-labeling technique with a self-supervised loss, enabling the model to utilize all unlabeled data in the training process, and shows that this method achieves state-of-the-art accuracies on multiple benchmark datasets while also reducing training times compared to existing SSL methods.

LaSSL: Label-Guided Self-Training for Semi-supervised Learning

This paper proposes a Label-guided Self-training approach to Semi-supervised Learning (LaSSL), which improves pseudo-label generations from two mutually boosted strategies and evaluates LaSSL on several classification benchmarks under partially labeled settings and demonstrates its superiority over the state-of-the-art approaches.

Debiased Self-Training for Semi-Supervised Learning

Debiased Self-Training (DST) is proposed, which can be seamlessly adapted to other self-training methods and help stabilize their training and balance performance across classes in both cases of training from scratch and finetuning from pre-trained models.

PercentMatch: Percentile-based Dynamic Thresholding for Multi-Label Semi-Supervised Classification

While much of recent study in semi-supervised learning (SSL) has achieved strong performance on single-label classification problems, an equally important yet underexplored problem is how to leverage

Revisiting Consistency Regularization for Deep Partial Label Learning

A new regularized training framework, which performs supervised learning on non-candidate labels and employs consistency regularization on candidate labels, is proposed for PLL and instantiate the regularization term by matching the outputs of multiple augmentations of an instance to a conformal label distribution.

USB: A Unified Semi-supervised Learning Benchmark

A Unified SSL Benchmark (USB) is constructed by selecting 15 diverse, challenging, and comprehensive tasks from CV, natural language processing (NLP), and audio processing (Audio), on which to systematically evaluate dominant SSL methods, and also open-source a modular and extensible codebase for fair evaluation on these SSL methods.

FreeMatch: Self-adaptive Thresholding for Semi-supervised Learning

This work proposes FreeMatch to be used and introduces a self-adaptive class fairness regularization penalty that encourages the model to produce diverse predictions during the early stages of training and indicates the superiority of FreeMatch especially when the labeled data are extremely rare.

RDA: Reciprocal Distribution Alignment for Robust Semi-supervised Learning

. In this work, we propose Reciprocal Distribution Alignment (RDA) to address semi-supervised learning (SSL), which is a hyperparameter-free framework that is independent of confidence threshold and

ConMatch: Semi-Supervised Learning with Confidence-Guided Consistency Regularization

. We present a novel semi-supervised learning framework that intelligently leverages the consistency regularization between the model’s predictions from two strongly-augmented views of an image,

The Rich Get Richer: Disparate Impact of Semi-Supervised Learning

The disparate impacts of deploying SSL are revealed: the sub-population who has a higher baseline accuracy without using SSL tends to benefit more from SSL; while the sub -population who suffers from a low baseline accuracy might even observe a performance drop after adding the SSL module.



Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data

A simple and effective safe deep SSL method to alleviate the harm caused by class distribution mismatch and it is theoretically guaranteed that its generalization approaches the optimal in the order O( √ d ln(n)/n), even faster than the convergence rate in supervised learning associated with massive parameters.

Unsupervised Data Augmentation for Consistency Training

A new perspective on how to effectively noise unlabeled examples is presented and it is argued that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning.

Learning Safe Prediction for Semi-Supervised Regression

This work considers the learning of a safe prediction from multiple semi-supervised regressors, which is not worse than a direct supervised learner with only labeled data, and shows that the proposal is provably safe and has already achieved the maximal performance gain.

Realistic Evaluation of Deep Semi-Supervised Learning Algorithms

This work creates a unified reimplemention and evaluation platform of various widely-used SSL techniques and finds that the performance of simple baselines which do not use unlabeled data is often underreported, that SSL methods differ in sensitivity to the amount of labeled and unlabeling data, and that performance can degrade substantially when the unlabelED dataset contains out-of-class examples.

Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning

This paper proposes a fast and effective approximation of the influence function, a measure of a model's dependency on one training example, and demonstrates that this technique outperforms state-of-the-art methods on semi-supervised image and language classification tasks.

Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning

An unsupervised loss function is proposed that takes advantage of the stochastic nature of these methods and minimizes the difference between the predictions of multiple passes of a training sample through the network.

DivideMix: Learning with Noisy Labels as Semi-supervised Learning

This work proposes DivideMix, a novel framework for learning with noisy labels by leveraging semi-supervised learning techniques, which models the per-sample loss distribution with a mixture model to dynamically divide the training data into a labeled set with clean samples and an unlabeled set with noisy samples.

Semi-supervised Learning by Entropy Minimization

This framework, which motivates minimum entropy regularization, enables to incorporate unlabeled data in the standard supervised learning, and includes other approaches to the semi-supervised problem as particular or limiting cases.

Unlabeled data: Now it helps, now it doesn't

A finite sample analysis is developed that characterizes the value of un-labeled data and quantifies the performance improvement of SSL compared to supervised learning, and shows that there are large classes of problems for which SSL can significantly outperform supervised learning in finite sample regimes and sometimes also in terms of error convergence rates.

Improving Semi-Supervised Support Vector Machines Through Unlabeled Instances Selection

This paper proposes the S3VM-us method by using hierarchical clustering to select the unlabeled instances of S3VMs such that only the ones which are very likely to be helpful are exploited, while some highly risky unlabeling instances are avoided.