Big Self-Supervised Models are Strong Semi-Supervised Learners
@article{Chen2020BigSM, title={Big Self-Supervised Models are Strong Semi-Supervised Learners}, author={Ting Chen and Simon Kornblith and Kevin Swersky and Mohammad Norouzi and Geoffrey E. Hinton}, journal={ArXiv}, year={2020}, volume={abs/2006.10029} }
One paradigm for learning from few labeled examples while making best use of a large amount of unlabeled data is unsupervised pretraining followed by supervised fine-tuning. Although this paradigm uses unlabeled data in a task-agnostic way, in contrast to most previous approaches to semi-supervised learning for computer vision, we show that it is surprisingly effective for semi-supervised learning on ImageNet. A key ingredient of our approach is the use of a big (deep and wide) network during…
Figures and Tables from this paper
1,243 Citations
An Overview of Deep Semi-Supervised Learning
- Computer ScienceArXiv
- 2020
A comprehensive overview of deep semi-supervised learning is provided, starting with an introduction to the field, followed by a summarization of the dominant semi- supervised approaches in deep learning.
Streaming Self-Training via Domain-Agnostic Unlabeled Images
- Computer ScienceArXiv
- 2021
We present streaming self-training (SST) that aims to democratize the process of learning visual recognition models such that a non-expert user can define a new task depending on their needs via a…
FROST: Faster and more Robust One-shot Semi-supervised Training
- Computer ScienceArXiv
- 2020
By combining semi-supervised learning with a one-stage, single network version of self-training, the FROST methodology trains faster and is more robust to choices for the labeled samples and changes in hyper-parameters.
SEED: Self-supervised Distillation For Visual Representation
- Computer ScienceICLR
- 2021
This paper proposes a new learning paradigm, named SElf-SupErvised Distillation (SEED), where a larger network is leverage to transfer its representational knowledge into a smaller architecture in a self-supervised fashion, and shows that SEED dramatically boosts the performance of small networks on downstream tasks.
Self-Tuning for Data-Efficient Deep Learning
- Computer ScienceICML
- 2021
SelfTuning is presented to enable data-efficient deep learning by unifying the exploration of labeled and unlabeled data and the transfer of a pre-trained model, as well as a Pseudo Group Contrast (PGC) mechanism to mitigate the reliance on pseudo-labels and boost the tolerance to false labels.
How Well Do Self-Supervised Models Transfer?
- Computer Science2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2021
It is shown that on most tasks the best self-supervised models outperform supervision, confirming the recently observed trend in the literature and finding ImageNet Top-1 accuracy to be highly correlated with transfer to many-shot recognition, but increasingly less so for few-shot, object detection and dense prediction.
Self-Supervised Learning for Large-Scale Unsupervised Image Clustering
- Computer ScienceArXiv
- 2020
This paper proposes a simple scheme for unsupervised classification based on self-supervised representations and evaluates the proposed approach with several recent self- supervised methods showing that it achieves competitive results for ImageNet classification.
On the Marginal Benefit of Active Learning: Does Self-Supervision Eat its Cake?
- Computer ScienceICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2021
This paper provides a novel algorithmic framework integrating self-supervised pretraining, active learning, and consistency-regularized self-training, and fails to observe any additional benefit of state-of-the-art active learning algorithms when combined with state of theart S4L techniques.
Are Fewer Labels Possible for Few-shot Learning?
- Computer ScienceArXiv
- 2020
Transductive unsupervised pretraining is proposed that achieves a better clustering by involving target data even though its amount is very limited, and the improved clustering result is of great value for identifying the most representative samples for users to label.
Self-supervised Pretraining of Visual Features in the Wild
- Computer ScienceArXiv
- 2021
The final SElf-supERvised (SEER) model, a RegNetY with 1.3B parameters trained on 1B random images with 512 GPUs achieves 84.2% top-1 accuracy, surpassing the best self-supervised pretrained model by 1% and confirming that self- Supervised learning works in a real world setting.
References
SHOWING 1-10 OF 74 REFERENCES
Unsupervised Data Augmentation for Consistency Training
- Computer ScienceNeurIPS
- 2020
A new perspective on how to effectively noise unlabeled examples is presented and it is argued that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning.
Billion-scale semi-supervised learning for image classification
- Computer ScienceArXiv
- 2019
This paper proposes a pipeline, based on a teacher/student paradigm, that leverages a large collection of unlabelled images to improve the performance for a given target architecture, like ResNet-50 or ResNext.
Exploring the Limits of Weakly Supervised Pretraining
- Computer ScienceECCV
- 2018
This paper presents a unique study of transfer learning with large convolutional networks trained to predict hashtags on billions of social media images and shows improvements on several image classification and object detection tasks, and reports the highest ImageNet-1k single-crop, top-1 accuracy to date.
Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning
- Computer ScienceNIPS
- 2016
An unsupervised loss function is proposed that takes advantage of the stochastic nature of these methods and minimizes the difference between the predictions of multiple passes of a training sample through the network.
Temporal Ensembling for Semi-Supervised Learning
- Computer ScienceICLR
- 2017
Self-ensembling is introduced, where it is shown that this ensemble prediction can be expected to be a better predictor for the unknown labels than the output of the network at the most recent training epoch, and can thus be used as a target for training.
Unsupervised Data Augmentation
- Computer ScienceArXiv
- 2019
UDA has a small twist in that it makes use of harder and more realistic noise generated by state-of-the-art data augmentation methods, which leads to substantial improvements on six language tasks and three vision tasks even when the labeled set is extremely small.
Realistic Evaluation of Deep Semi-Supervised Learning Algorithms
- Computer ScienceNeurIPS
- 2018
This work creates a unified reimplemention and evaluation platform of various widely-used SSL techniques and finds that the performance of simple baselines which do not use unlabeled data is often underreported, that SSL methods differ in sensitivity to the amount of labeled and unlabeling data, and that performance can degrade substantially when the unlabelED dataset contains out-of-class examples.
Revisiting Self-Supervised Visual Representation Learning
- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019
This study revisits numerous previously proposed self-supervised models, conducts a thorough large scale study and uncovers multiple crucial insights about standard recipes for CNN design that do not always translate to self- supervised representation learning.
Milking CowMask for Semi-Supervised Image Classification
- Computer ScienceVISIGRAPP
- 2022
A novel mask-based augmentation method called CowMask is presented, using it to provide perturbations for semi-supervised consistency regularization, which achieves a state-of-the-art result on ImageNet with 10% labeled data.
Meta Pseudo Labels
- Computer Science2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2021
We present Meta Pseudo Labels, a semi-supervised learning method that achieves a new state-of-the-art top-1 accuracy of 90.2% on ImageNet, which is 1.6% better than the existing state-of-the-art…