Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers

  title={Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers},
  author={Lixiang Ru and Yibing Zhan and Baosheng Yu and Bo Du},
  journal={2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  • Lixiang RuYibing Zhan Bo Du
  • Published 5 March 2022
  • Computer Science
  • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Weakly-supervised semantic segmentation (WSSS) with image-level labels is an important and challenging task. Due to the high training efficiency, end-to-end solutions for WSSS have received increasing attention from the community. However, current methods are mainly based on convolutional neural networks and fail to explore the global information properly, thus usually resulting in incomplete object regions. In this paper, to address the aforementioned problem, we introduce Transformers, which… 

Background-Mixed Augmentation for Weakly Supervised Change Detection

This work proposes the background-mixed augmentation that is specifically designed for change detection by augmenting examples under the guidance of a set of background changing images and letting deep CD models see diverse environment variations and proposes the augmented & real data consistency loss that encourages the generalization increase significantly.

Affinity Feature Strengthening for Accurate, Complete and Robust Vessel Segmentation

A novelAFN is presented which adopts a contrast-insensitive approach based on multiscale affinity to jointly model topology and pixel-wise segmentation features that outperforms the state-of-the-art methods in terms of both higher accuracy and topological metrics, and meanwhile is more robust to various contrast changes than existing methods.

Max Pooling with Vision Transformers reconciles class and shape in weakly supervised semantic segmentation

. Weakly Supervised Semantic Segmentation (WSSS) research has explored many directions to improve the typical pipeline CNN plus class activation maps (CAM) plus refinements, given the image-class

eX-ViT: A Novel eXplainable Vision Transformer for Weakly Supervised Semantic Segmentation

A novel vision transformer dubbed the eXplainable Vision Transformer (eX-ViT), an intrinsically interpretable transformer model that is able to jointly discover robust interpretable features and perform the prediction, and outperforms the state-of-the-art black-box methods using only image-level labels in accuracy and interpretability.

Out-of-Candidate Rectification for Weakly Supervised Semantic Segmentation

A group ranking-based OCR mechanism that can achieve re-markable performance gains on both Pascal VOC and MS COCO datasets with negligible extra training overhead, which justifies the effectiveness and generality of the OCR.

CAMANet: Class Activation Map Guided Attention Network for Radiology Report Generation

Experimental results on two widely used RRG benchmarks prove the superiority of CAMANet over previous studies and verify the ablation studies further verify the individuality of individual components of CAManet.

BoxTeacher: Exploring High-Quality Pseudo Labels for Weakly Supervised Instance Segmentation

This work presents BoxTeacher, an efficient and end-to-end training framework for high-performance weakly supervised instance segmentation, which leverages a sophisticated teacher to generate high-quality masks as pseudo labels, and presents a mask-aware confidence score to estimate the quality of pseudo masks.

Centralized Feature Pyramid for Object Detection

This paper proposes a Centralized Feature Pyramid (CFP) for object detection, which is based on a globally explicit centralized feature regulation, and has the ability to capture the global long-range dependencies, but also obtain an all-round yet discriminative feature representation.

Towards label-efficient automatic diagnosis and analysis: a comprehensive survey of advanced deep learning-based weakly-supervised, semi-supervised and self-supervised techniques in histopathological image analysis

A comprehensive and systematic review of the latest studies on weakly supervised learning, semi-supervised learning, and self-super supervised learning in the field of computational pathology from both technical and methodological perspectives is presented.

RecurSeed and EdgePredictMix: Single-stage Learning is Sufficient for Weakly-Supervised Semantic Segmentation

RecurSeed is proposed which alternately reduces non- and false detections through recursive iterations, thereby implicitly implicitly choosing an optimal junction that minimizes both errors and a novel data augmentation approach called EdgePredictMix, which further expresses an object’s edge by utilizing the probability difference information between adjacent pixels in combining the segmentation results.



Attention is All you Need

A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation

This work proposes a method to reduce the information bottleneck by removing the last activation function of a deep neural network, and introduces a new pooling method that further encourages the transmission of information from non-discriminative regions to the classification.

Adaptive Affinity Loss and Erroneous Pseudo-Label Refinement for Weakly Supervised Semantic Segmentation

This paper introduces an adaptive affinity loss to thoroughly learn the local pairwise affinity of multi-stage approaches in a single-stage model and proposes a novel label reassign loss to mitigate over-fitting.

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

SegFormer is presented, a simple, efficient yet powerful semantic segmentation framework which unifies Transformers with lightweight multilayer perceptron (MLP) decoders and shows excellent zero-shot robustness on Cityscapes-C.

Single-Stage Semantic Segmentation From Image Labels

  • Nikita AraslanovS. Roth
  • Computer Science
    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020
This work develops a segmentation-based network model and a self-supervised training scheme to train for semantic masks from image-level annotations in a single stage, and shows that despite its simplicity, this method achieves results that are competitive with significantly more complex pipelines, substantially outperforming earlier single-stage methods.

Learning Deep Features for Discriminative Localization

In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network (CNN) to have remarkable localization ability

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

This paper considers fully connected CRF models defined on the complete set of pixels in an image and proposes a highly efficient approximate inference algorithm in which the pairwise edge potentials are defined by a linear combination of Gaussian kernels.

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

This paper deploys a pure transformer to encode an image as a sequence of patches, termed SEgmentation TRansformer (SETR), and shows that SETR achieves new state of the art on ADE20K, Pascal Context, and competitive results on Cityscapes.

Reliability Does Matter: An End-to-End Weakly Supervised Semantic Segmentation Approach

This work harnesses the image-level labels to produce reliable pixel-level annotations and design a fully end-to-end network to learn to predict segmentation maps and gets a new state-of-the-art performance on the Pascal VOC.

Weakly Supervised Learning of Instance Segmentation With Inter-Pixel Relations

IRNet is proposed, which estimates rough areas of individual instances and detects boundaries between different object classes and enables to assign instance labels to the seeds and to propagate them within the boundaries so that the entire areas of instances can be estimated accurately.