• Corpus ID: 235829267

Per-Pixel Classification is Not All You Need for Semantic Segmentation

@inproceedings{Cheng2021PerPixelCI,
  title={Per-Pixel Classification is Not All You Need for Semantic Segmentation},
  author={Bowen Cheng and Alexander G. Schwing and Alexander Kirillov},
  booktitle={NeurIPS},
  year={2021}
}
ADE20K [22] contains 20k images for training and 2k images for validation. The data comes from the ADE20K-Full dataset where 150 semantic categories are selected to be included in evaluation from the SceneParse150 challenge [21]. The images are resized such that the shortest side is no greater than 512 pixels. During inference, we resize the shorter side of the image to the corresponding crop size. 

Few-shot semantic segmentation via mask aggregation

TLDR
The mask aggregation network (MANet), which is a simple mask classification model, is proposed to simultaneously generate a fixed number of masks and their probabilities of being targets and the final segmentation result is obtained by aggregating all the masks according to their locations.

Unsupervised Multi-object Segmentation Using Attention and Soft-argmax

We introduce a new architecture for unsupervised object-centric representation learning and multi-object detection and segmentation, which uses a translation-equivariant attention mechanism to

RankSeg: Adaptive Pixel Classification with Image Category Ranking for Segmentation

TLDR
This paper proposes to decompose segmentation into two sub-problems: (i) image-level or video-level multi-label classification and (ii) pixel-level rank-adaptive selected- label classification, which can be used to improve various existing segmentation frameworks.

A Survey on Deep Learning-based Architectures for Semantic Segmentation on 2D images

TLDR
This survey focuses on the recent scientific developments in semantic segmentation, specifically on deep learning-based methods using 2D images, and chronologically categorised the approaches into three main periods, namely pre-and early deep learning era, the fully convolutional era, and the post-FCN era.

Open-World Entity Segmentation

TLDR
This work proposes a CondInst-like fully-convolutional architecture with two novel modules specifically designed to exploit the class-agnostic and non-overlapping requirements of ES, and investigates the feasibility of convolutional center-based representation to segment things and stuffs in a unified manner.

Pyramid Fusion Transformer for Semantic Segmentation

TLDR
This study finds that per-Mask classification decoder on top of a single-scale feature is not effective enough to extract reliable probability or mask, and proposes a transformer-based Pyramid Fusion Transformer (PFT) for per-mask approach semantic segmentation onTop of multi-scale features.

A Unified Efficient Pyramid Transformer for Semantic Segmentation

TLDR
This work proposes a unified framework (UN-EPT) to segment objects by considering both context information and boundary artifacts, and demonstrates promising performance on three popular benchmarks for semantic segmentation with low memory footprint.

A Simple Baseline for Zero-shot Semantic Segmentation with Pre-trained Vision-language Model

TLDR
This paper refuses the use of the prevalent one-stage FCN based framework, and advocates a two-stage semantic segmentation framework, with the first stage extracting generalizable mask proposals and the second stage leveraging an image based CLIP model to perform zero-shot classification on the masked image crops which are generated in the first stages.

MLSeg: Image and Video Segmentation as Multi-Label Classification and Selected-Label Pixel Classification

TLDR
This paper proposes to decompose segmentation into two sub-problems: (i) image-level or video-level multilabel classification and (ii) pixel-level selected-label classification, which is conceptually general and can be applied to various existing segmentation frameworks by simply adding a lightweight multi- label classification branch.

2 Image Sets , Challenges and Performance Evaluation

  • Computer Science
  • 2022
TLDR
This survey focuses on the recent scientific developments in semantic segmentation, specifically on deep learning-based methods using 2D images, and chronologically categorised the approaches into three main periods, namely pre-and early deep learning era, the fully convolutional era, and the post-FCN era.
...

References

SHOWING 1-10 OF 53 REFERENCES

The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes

TLDR
The Mapillary Vistas Dataset is a novel, large-scale street-level image dataset containing 25000 high-resolution images annotated into 66 object categories with additional, instance-specific labels for 37 classes, aiming to significantly further the development of state-of-the-art methods for visual road-scene understanding.

Segmenter: Transformer for Semantic Segmentation

TLDR
This paper introduces Segmenter, a transformer model for semantic segmentation that outperforms the state of the art on both ADE20K and Pascal Context datasets and is competitive on Cityscapes.

Scene Parsing through ADE20K Dataset

TLDR
The ADE20K dataset, spanning diverse annotations of scenes, objects, parts of objects, and in some cases even parts of parts, is introduced and it is shown that the trained scene parsing networks can lead to applications such as image content removal and scene synthesis.

Convolutional feature masking for joint object and stuff segmentation

  • Jifeng DaiKaiming HeJian Sun
  • Computer Science
    2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2015
TLDR
This paper proposes a joint method to handle objects and “stuff” (e.g., grass, sky, water) in the same framework and presents state-of-the-art results on benchmarks of PASCAL VOC and new PASCal-CONTEXT.

Statistical cues for domain specific image segmentation with performance analysis

  • S. KonishiA. Yuille
  • Environmental Science
    Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662)
  • 2000
TLDR
Performance analysis on the Sowerby dataset shows that the learnt models achieve high accuracy in classifying individual pixels into those classes for which the filter responses are approximately spatially homogeneous.

Simultaneous Detection and Segmentation

TLDR
This work builds on recent work that uses convolutional neural networks to classify category-independent region proposals (R-CNN), introducing a novel architecture tailored for SDS, and uses category-specific, top-down figure-ground predictions to refine the bottom-up proposals.

Pyramid Scene Parsing Network

TLDR
This paper exploits the capability of global context information by different-region-based context aggregation through the pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet) to produce good quality results on the scene parsing task.

OCNet: Object Context for Semantic Segmentation

TLDR
This paper proposes an efficient interlaced sparse self-attention scheme to model the dense relations between any two of all pixels via the combination of two sparse relation matrices and empirically shows the advantages of this approach with competitive performances on five challenging benchmarks.

Microsoft COCO: Common Objects in Context

We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

TLDR
This work addresses the task of semantic image segmentation with Deep Learning and proposes atrous spatial pyramid pooling (ASPP), which is proposed to robustly segment objects at multiple scales, and improves the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models.
...