Temporal Self-Ensembling Teacher for Semi-Supervised Object Detection

  title={Temporal Self-Ensembling Teacher for Semi-Supervised Object Detection},
  author={Cong Chen and Shouyang Dong and Ye Tian and Kunlin Cao and Li Liu and Yuanhao Guo},
This paper focuses on Semi-Supervised Object Detection (SSOD). Knowledge Distillation (KD) has been widely used for semi-supervised image classification. However, adapting these methods for SSOD has the following obstacles. (1) The teacher model serves a dual role as a teacher and a student, such that the teacher predictions on unlabeled images may be very close to those of student, which limits the upper-bound of the student. (2) The class imbalance issue in SSOD hinders an efficient knowledge… 

Figures and Tables from this paper

WSSOD: A New Pipeline for Weakly- and Semi-Supervised Object Detection
This work proposes a weaklyand semi-supervised object detection framework (WSSOD), which involves a two-stage learning procedure that demonstrates remarkable performance on PASCAL-VOC and MSCOCO benchmark, achieving a high performance comparable to those obtained in fully- supervised settings, with only one third of the annotations.
Domain Adaptive Semantic Segmentation with Regional Contrastive Consistency Regularization
A novel and fully end-to-end trainable approach, called regional contrastive consistency regularization (RCCR) for domain adaptive semantic segmentation, to pull the similar regional features extracted from the same location of different images and push the features from the different locations of the two images to be separated.


Leveraging Prior-Knowledge for Weakly Supervised Object Detection Under a Collaborative Self-Paced Curriculum Learning Framework
Comprehensive experiments on benchmark datasets demonstrate the superior capacity of the proposed C-SPCL regime and the proposed whole framework as compared with state-of-the-art methods along this research line.
Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results
The recently proposed Temporal Ensembling has achieved state-of-the-art results in several semi-supervised learning benchmarks, but it becomes unwieldy when learning large datasets, so Mean Teacher, a method that averages model weights instead of label predictions, is proposed.
Temporal Ensembling for Semi-Supervised Learning
Self-ensembling is introduced, where it is shown that this ensemble prediction can be expected to be a better predictor for the unknown labels than the output of the network at the most recent training epoch, and can thus be used as a target for training.
Min-Entropy Latent Model for Weakly Supervised Object Detection
A min-entropy latent model (MELM) is proposed for weakly supervised object detection, unified with feature learning and optimized with a recurrent learning algorithm, which progressively transfers the weak supervision to object locations.
Data Distillation: Towards Omni-Supervised Learning
It is argued that visual recognition models have recently become accurate enough that it is now possible to apply classic ideas about self-training to challenging real-world data and propose data distillation, a method that ensembles predictions from multiple transformations of unlabeled data, using a single model, to automatically generate new training annotations.
Scaling and Benchmarking Self-Supervised Visual Representation Learning
It is shown that by scaling on various axes (including data size and problem 'hardness'), one can largely match or even exceed the performance of supervised pre-training on a variety of tasks such as object detection, surface normal estimation and visual navigation using reinforcement learning.
Multi-task Self-Supervised Visual Learning
The results show that deeper networks work better, and that combining tasks—even via a na¨ýve multihead architecture—always improves performance.
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.
Consistency-based Semi-supervised Learning for Object detection
A Consistency-based Semi-supervised learning method for object Detection (CSD), which is a way of using consistency constraints as a tool for enhancing detection performance by making full use of available unlabeled data.
Revisiting Self-Supervised Visual Representation Learning
This study revisits numerous previously proposed self-supervised models, conducts a thorough large scale study and uncovers multiple crucial insights about standard recipes for CNN design that do not always translate to self- supervised representation learning.