A Closer Look at Self-training for Zero-Label Semantic Segmentation

@article{Pastore2021ACL,
  title={A Closer Look at Self-training for Zero-Label Semantic Segmentation},
  author={Giuseppe Pastore and Fabio Cermelli and Yongqin Xian and Massimiliano Mancini and Zeynep Akata and Barbara Caputo},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
  year={2021},
  pages={2687-2696}
}
Being able to segment unseen classes not observed during training is an important technical challenge in deep learning, because of its potential to reduce the expensive annotation required for semantic segmentation. Prior zero-label semantic segmentation works approach this task by learning visual-semantic embeddings or generative models. However, they are prone to overfitting on the seen classes because there is no training signal for them. In this paper, we study the challenging generalized… 

Figures and Tables from this paper

Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling

TLDR
A cross-modal pseudo-labeling framework, which generates training pseudo masks by aligning word semantics in captions with visual features of object masks in images to self-train a student model to mitigate the ad-verse impact of noisy pseudo masks.

A Simple Baseline for Zero-shot Semantic Segmentation with Pre-trained Vision-language Model

TLDR
This paper refuses the use of the prevalent one-stage FCN based framework, and advocates a two-stage semantic segmentation framework, with the first stage extracting generalizable mask proposals and the second stage leveraging an image based CLIP model to perform zero-shot classification on the masked image crops which are generated in the first stages.

Decoupling Zero-Shot Semantic Segmentation

TLDR
A simple and effective zero-shot semantic segmentation model, called ZegFormer, is proposed, which outperforms the previous methods on ZS3 standard benchmarks by large margins, e.g .

DenseCLIP: Extract Free Dense Labels from CLIP

TLDR
The finding suggests that DenseCLIP can serve as a new reliable source of supervision for dense prediction tasks to achieve annotation-free segmentation, specifically in semantic segmentation of Contrastive Language-Image Pre-training models.

ReCo: Retrieve and Co-segment for Zero-shot Transfer

TLDR
This work leverage the retrieval abilities of one such language-image pretrained model, CLIP, to dynamically curate training sets from unlabelled images for arbitrary collections of concept names, and leverage the robust correspondences offered by modern image representations to co-segment entities among the resulting collections.

A Survey on Deep Learning-based Architectures for Semantic Segmentation on 2D images

TLDR
This survey focuses on the recent scientific developments in semantic segmentation, specifically on deep learning-based methods using 2D images, and chronologically categorised the approaches into three main periods, namely pre-and early deep learning era, the fully convolutional era, and the post-FCN era.

2 Image Sets , Challenges and Performance Evaluation

  • Computer Science
  • 2022
TLDR
This survey focuses on the recent scientific developments in semantic segmentation, specifically on deep learning-based methods using 2D images, and chronologically categorised the approaches into three main periods, namely pre-and early deep learning era, the fully convolutional era, and the post-FCN era.

Sim-to-Real 6D Object Pose Estimation via Iterative Self-training for Robotic Bin-picking

TLDR
This paper establishes a photorealistic simulator to synthesize abundant virtual data, and uses this to train an initial pose estimation network, which takes the role of a teacher model, which generates pose predictions for unlabeled real data.

COSMOS: Cross-Modality Unsupervised Domain Adaptation for 3D Medical Image Segmentation based on Target-aware Domain Translation and Iterative Self-Training

TLDR
This work proposes a self-training based unsupervised domain adaptation framework for 3D medical image segmentation named COSMOS and validate it with automatic segmentation of Vestibular Schwannoma and cochlea on high-resolution T2 Magnetic Resonance Images (MRI).

Incremental Learning for Panoramic Radiograph Segmentation

TLDR
This study aimed to determine a fundamental method for the automated detection and treatment of dental and orthodontic problems by leveraging incremental learning approaches using Mask RCNN as backbone networks on small datasets to construct a more accurate model from automatically labeled data.

References

SHOWING 1-10 OF 50 REFERENCES

Zero-Shot Semantic Segmentation

TLDR
A novel architecture, ZS3Net, combining a deep visual segmentation model with an approach to generate visual representations from semantic word embeddings is presented, addressing pixel classification tasks where both seen and unseen categories are faced at test time (so called "generalized" zero-shot classification).

Learning Unbiased Zero-Shot Semantic Segmentation Networks Via Transductive Transfer

TLDR
This letter proposes an easy-to-implement transductive approach to alleviate the prediction bias in zero-shot semantic segmentation, and demonstrates the effectiveness of this approach over the PASCAL dataset.

Semantic Projection Network for Zero- and Few-Label Semantic Segmentation

TLDR
The proposed semantic projection network (SPNet) achieves this goal by incorporating a class-level semantic information into any network designed for semantic segmentation, in an end-to-end manner.

Modeling the Background for Incremental Learning in Semantic Segmentation

TLDR
This work revisits classical incremental learning methods, and proposes a new distillation-based framework which explicitly accounts for a semantic distribution shift, and introduces a novel strategy to initialize classifier's parameters, thus preventing biased predictions toward the background class.

Semi-Supervised Semantic Segmentation With Cross-Consistency Training

TLDR
This work observes that for semantic segmentation, the low-density regions are more apparent within the hidden representations than within the inputs, and proposes cross-consistency training, where an invariance of the predictions is enforced over different perturbations applied to the outputs of the encoder.

PseudoSeg: Designing Pseudo Labels for Semantic Segmentation

TLDR
This work presents a simple and novel re-design of pseudo-labeling to generate well-calibrated structured pseudo labels for training with unlabeled or weakly-labeled data and demonstrates the effectiveness of the proposed pseudo- labeling strategy in both low-data and high-data regimes.

Generating Visual Representations for Zero-Shot Classification

TLDR
This paper suggests to address ZSC and GZSC by i) learning a conditional generator using seen classes ii) generate artificial training examples for the categories without exemplars, which is then turned into a standard supervised learning problem.

Semi-Supervised Semantic Segmentation With High- and Low-Level Consistency

TLDR
This work proposes an approach for semi-supervised semantic segmentation that learns from limited pixel-wise annotated samples while exploiting additional annotation-free images, and achieves significant improvement over existing methods, especially when trained with very few labeled samples.

Exploiting Saliency for Object Segmentation from Image Level Labels

TLDR
This paper proposes using a saliency model as additional information and hereby exploit prior knowledge on the object extent and image statistics and shows how to combine both information sources in order to recover 80% of the fully supervised performance of pixel-wise semantic labelling.

What's the Point: Semantic Segmentation with Point Supervision

TLDR
This work takes a natural step from image-level annotation towards stronger supervision: it asks annotators to point to an object if one exists, and incorporates this point supervision along with a novel objectness potential in the training loss function of a CNN model.