Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results
@inproceedings{Tarvainen2017MeanTA, title={Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results}, author={Antti Tarvainen and Harri Valpola}, booktitle={NIPS}, year={2017} }
The recently proposed Temporal Ensembling has achieved state-of-the-art results in several semi-supervised learning benchmarks. [] Key Method As an additional benefit, Mean Teacher improves test accuracy and enables training with fewer labels than Temporal Ensembling. Without changing the network architecture, Mean Teacher achieves an error rate of 4.35% on SVHN with 250 labels, outperforming Temporal Ensembling trained with 1000 labels. We also show that a good network architecture is crucial to performance…
1,707 Citations
Unsupervised Data Augmentation for Consistency Training
- Computer ScienceNeurIPS
- 2020
A new perspective on how to effectively noise unlabeled examples is presented and it is argued that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning.
Smooth Neighbors on Teacher Graphs for Semi-Supervised Learning
- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
A novel method, called Smooth Neighbors on Teacher Graphs (SNTG), which serves as a similarity measure with respect to which the representations of "similar" neighboring points are learned to be smooth on the low-dimensional manifold and achieves state-of-the-art results on semi-supervised learning benchmarks.
Unsupervised Domain Adaptation using Generative Models and Self-ensembling
- Computer ScienceArXiv
- 2018
The results suggest that selfensembling is better than simple data augmentation with the newly generated data and a single model trained this way can have the best performance across all different transfer tasks.
SELF: Learning to Filter Noisy Labels with Self-Ensembling
- Computer ScienceICLR
- 2020
This work presents a simple and effective method self-ensemble label filtering (SELF) to progressively filter out the wrong labels during training that substantially outperforms all previous works on noise-aware learning across different datasets and can be applied to a broad set of network architectures.
Beyond Self-Supervision: A Simple Yet Effective Network Distillation Alternative to Improve Backbones
- Computer ScienceArXiv
- 2021
This paper proposes to improve existing baseline networks via knowledge distillation from off-the-shelf pre-trained big powerful models by only driving prediction of the student model consistent with that of the teacher model, and finds that such simple distillation settings perform extremely effective.
AdaReNet: Adaptive Reweighted Semi-supervised Active Learning to Accelerate Label Acquisition
- Computer SciencePETRA
- 2021
This work takes a holistic approach to label acquisition and considers the expansion of clean and pseudo-labeled subsets jointly and introduces a collaborative teacher-student framework, where the teacher learns a data-driven curriculum.
Improving Consistency-Based Semi-Supervised Learning with Weight Averaging
- Computer ScienceArXiv
- 2018
It is shown that consistency regularization leads to flatter but narrower optima for semi-supervised models, and that with fast-SWA the simple $\Pi$ model becomes state-of-the-art for large labeled settings.
DMT: Dynamic Mutual Training for Semi-Supervised Learning
- Computer SciencePattern Recognition
- 2022
Source Target Model Weak Strong Batches Model Random Logit Interpolation
- Computer Science
- 2022
AdaMatch, a unified solution for unsupervised domain adaptation, is introduced and it is found that AdaMatch either matches or significantly exceeds the state-of-the-art in each case using the same hyper-parameters regardless of the dataset or task.
Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation
- Computer Science
- 2021
This paper addresses the prediction accuracy problem of consistency learning methods with novel extensions of the mean-teacher model, which include a new auxiliary teacher, and the replacement of MT’s mean square error (MSE) by a stricter confidence-weighted cross-entropy (Conf-CE) loss.
References
SHOWING 1-10 OF 48 REFERENCES
Temporal Ensembling for Semi-Supervised Learning
- Computer ScienceICLR
- 2017
Self-ensembling is introduced, where it is shown that this ensemble prediction can be expected to be a better predictor for the unknown labels than the output of the network at the most recent training epoch, and can thus be used as a target for training.
Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning
- Computer ScienceNIPS
- 2016
An unsupervised loss function is proposed that takes advantage of the stochastic nature of these methods and minimizes the difference between the predictions of multiple passes of a training sample through the network.
Swapout: Learning an ensemble of deep architectures
- Computer ScienceNIPS
- 2016
This work describes Swapout, a new stochastic training method that outperforms ResNets of identical network structure yielding impressive results on CIFAR-10 and CIFar-100 and proposes a parameterization that reveals connections to exiting architectures and suggests a much richer set of architectures to be explored.
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
- Computer ScienceICML
- 2015
Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Shake-Shake regularization
- Computer ScienceArXiv
- 2017
The method introduced in this paper aims at helping deep learning practitioners faced with an overfit problem. The idea is to replace, in a multi-branch network, the standard summation of parallel…
Deep Networks with Stochastic Depth
- Computer ScienceECCV
- 2016
Stochastic depth is proposed, a training procedure that enables the seemingly contradictory setup to train short networks and use deep networks at test time and reduces training time substantially and improves the test error significantly on almost all data sets that were used for evaluation.
Aggregated Residual Transformations for Deep Neural Networks
- Computer Science2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
On the ImageNet-1K dataset, it is empirically show that even under the restricted condition of maintaining complexity, increasing cardinality is able to improve classification accuracy and is more effective than going deeper or wider when the authors increase the capacity.
Deep Residual Learning for Image Recognition
- Computer Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Variational Autoencoder for Deep Learning of Images, Labels and Captions
- Computer ScienceNIPS
- 2016
A novel variational autoencoder is developed to model images, as well as associated labels or captions, and a new semi-supervised setting is manifested for CNN learning with images; the framework even allows unsupervised CNN learning, based on images alone.
On Calibration of Modern Neural Networks
- Computer ScienceICML
- 2017
It is discovered that modern neural networks, unlike those from a decade ago, are poorly calibrated, and on most datasets, temperature scaling -- a single-parameter variant of Platt Scaling -- is surprisingly effective at calibrating predictions.