• Corpus ID: 219721239

Big Self-Supervised Models are Strong Semi-Supervised Learners

  title={Big Self-Supervised Models are Strong Semi-Supervised Learners},
  author={Ting Chen and Simon Kornblith and Kevin Swersky and Mohammad Norouzi and Geoffrey E. Hinton},
One paradigm for learning from few labeled examples while making best use of a large amount of unlabeled data is unsupervised pretraining followed by supervised fine-tuning. Although this paradigm uses unlabeled data in a task-agnostic way, in contrast to most previous approaches to semi-supervised learning for computer vision, we show that it is surprisingly effective for semi-supervised learning on ImageNet. A key ingredient of our approach is the use of a big (deep and wide) network during… 

Figures and Tables from this paper

An Overview of Deep Semi-Supervised Learning

A comprehensive overview of deep semi-supervised learning is provided, starting with an introduction to the field, followed by a summarization of the dominant semi- supervised approaches in deep learning.

Streaming Self-Training via Domain-Agnostic Unlabeled Images

We present streaming self-training (SST) that aims to democratize the process of learning visual recognition models such that a non-expert user can define a new task depending on their needs via a

FROST: Faster and more Robust One-shot Semi-supervised Training

By combining semi-supervised learning with a one-stage, single network version of self-training, the FROST methodology trains faster and is more robust to choices for the labeled samples and changes in hyper-parameters.

SEED: Self-supervised Distillation For Visual Representation

This paper proposes a new learning paradigm, named SElf-SupErvised Distillation (SEED), where a larger network is leverage to transfer its representational knowledge into a smaller architecture in a self-supervised fashion, and shows that SEED dramatically boosts the performance of small networks on downstream tasks.

Self-Tuning for Data-Efficient Deep Learning

SelfTuning is presented to enable data-efficient deep learning by unifying the exploration of labeled and unlabeled data and the transfer of a pre-trained model, as well as a Pseudo Group Contrast (PGC) mechanism to mitigate the reliance on pseudo-labels and boost the tolerance to false labels.

How Well Do Self-Supervised Models Transfer?

It is shown that on most tasks the best self-supervised models outperform supervision, confirming the recently observed trend in the literature and finding ImageNet Top-1 accuracy to be highly correlated with transfer to many-shot recognition, but increasingly less so for few-shot, object detection and dense prediction.

Self-Supervised Learning for Large-Scale Unsupervised Image Clustering

This paper proposes a simple scheme for unsupervised classification based on self-supervised representations and evaluates the proposed approach with several recent self- supervised methods showing that it achieves competitive results for ImageNet classification.

On the Marginal Benefit of Active Learning: Does Self-Supervision Eat its Cake?

This paper provides a novel algorithmic framework integrating self-supervised pretraining, active learning, and consistency-regularized self-training, and fails to observe any additional benefit of state-of-the-art active learning algorithms when combined with state of theart S4L techniques.

Are Fewer Labels Possible for Few-shot Learning?

Transductive unsupervised pretraining is proposed that achieves a better clustering by involving target data even though its amount is very limited, and the improved clustering result is of great value for identifying the most representative samples for users to label.

Self-supervised Pretraining of Visual Features in the Wild

The final SElf-supERvised (SEER) model, a RegNetY with 1.3B parameters trained on 1B random images with 512 GPUs achieves 84.2% top-1 accuracy, surpassing the best self-supervised pretrained model by 1% and confirming that self- Supervised learning works in a real world setting.



Unsupervised Data Augmentation for Consistency Training

A new perspective on how to effectively noise unlabeled examples is presented and it is argued that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning.

Billion-scale semi-supervised learning for image classification

This paper proposes a pipeline, based on a teacher/student paradigm, that leverages a large collection of unlabelled images to improve the performance for a given target architecture, like ResNet-50 or ResNext.

Exploring the Limits of Weakly Supervised Pretraining

This paper presents a unique study of transfer learning with large convolutional networks trained to predict hashtags on billions of social media images and shows improvements on several image classification and object detection tasks, and reports the highest ImageNet-1k single-crop, top-1 accuracy to date.

Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning

An unsupervised loss function is proposed that takes advantage of the stochastic nature of these methods and minimizes the difference between the predictions of multiple passes of a training sample through the network.

Temporal Ensembling for Semi-Supervised Learning

Self-ensembling is introduced, where it is shown that this ensemble prediction can be expected to be a better predictor for the unknown labels than the output of the network at the most recent training epoch, and can thus be used as a target for training.

Unsupervised Data Augmentation

UDA has a small twist in that it makes use of harder and more realistic noise generated by state-of-the-art data augmentation methods, which leads to substantial improvements on six language tasks and three vision tasks even when the labeled set is extremely small.

Realistic Evaluation of Deep Semi-Supervised Learning Algorithms

This work creates a unified reimplemention and evaluation platform of various widely-used SSL techniques and finds that the performance of simple baselines which do not use unlabeled data is often underreported, that SSL methods differ in sensitivity to the amount of labeled and unlabeling data, and that performance can degrade substantially when the unlabelED dataset contains out-of-class examples.

Revisiting Self-Supervised Visual Representation Learning

This study revisits numerous previously proposed self-supervised models, conducts a thorough large scale study and uncovers multiple crucial insights about standard recipes for CNN design that do not always translate to self- supervised representation learning.

Milking CowMask for Semi-Supervised Image Classification

A novel mask-based augmentation method called CowMask is presented, using it to provide perturbations for semi-supervised consistency regularization, which achieves a state-of-the-art result on ImageNet with 10% labeled data.

Meta Pseudo Labels

We present Meta Pseudo Labels, a semi-supervised learning method that achieves a new state-of-the-art top-1 accuracy of 90.2% on ImageNet, which is 1.6% better than the existing state-of-the-art