EfficientFCN: Holistically-guided Decoding for Semantic Segmentation

  title={EfficientFCN: Holistically-guided Decoding for Semantic Segmentation},
  author={Jianbo Liu and Junjun He and Jiawei Zhang and Jimmy S. J. Ren and Hongsheng Li},
Both performance and efficiency are important to semantic segmentation. State-of-the-art semantic segmentation algorithms are mostly based on dilated Fully Convolutional Networks (dilatedFCN), which adopt dilated convolutions in the backbone networks to extract high-resolution feature maps for achieving high-performance segmentation performance. However, due to many convolution operations are conducted on the high-resolution feature maps, such dilatedFCN-based methods result in large… 

Dynamic Neural Representational Decoders for High-Resolution Semantic Segmentation

This work proposes a novel decoder, termed dynamic neural representational decoder (NRD), which is simple yetSignificantly more efficiency, and achieves competitive performance with the methods using dilated encoders with only ∼ 15% computational costs.

Fully Transformer Networks for Semantic Image Segmentation

This work proposes a Pyramid Group Transformer (PGT) as the encoder for progressively learning hierarchical features, while reducing the computation complexity of the standard visual transformer(ViT) to achieve new state-of-the-art results on multiple challenging semantic segmentation benchmarks.

Lightweight Self-Attention Network for Semantic Segmentation

The Lightweight Self-Attentive Module (LSAM) captures information using a hand-designed compact feature representation, and weighted fusion of position information, and in the decoder structure, an improved up-sampling module is proposed.

Pyramid Fusion Transformer for Semantic Segmentation

This study finds that per-Mask classification decoder on top of a single-scale feature is not effective enough to extract reliable probability or mask, and proposes a transformer-based Pyramid Fusion Transformer (PFT) for per-mask approach semantic segmentation onTop of multi-scale features.

CGFNet: cross-guided fusion network for RGB-thermal semantic segmentation

This work proposes a novel cross-guided fusion attention network for RGB-T semantic segmentation, which uses an attention mechanism to extract the weights of two modalities and guide each other and proposes a dual decoder to decode the features of different modalities.

ES-CRF: Embedded Superpixel CRF for Semantic Segmentation

This work dives deep into the problem of boundary pixels and proposes a novel method named Embedded Superpixel CRF (ES-CRF), which innovatively fuses the CRF mechanism into the CNN network as an organic whole for more effective end-to-end optimization.

SegViT: Semantic Segmentation with Plain Vision Transformers

This work explores the capability of plain Vision Transformers for semantic segmentation, and proposes the Attention-to-Mask (ATM) module, in which the similarity maps between a set of learnable class tokens and the spatial feature maps are transferred to the segmentation masks.

Bilateral attention network for semantic segmentation

This work proposes the bilateral attention network for semantic segmentation, which embeds two attention modules in the encoder and decoder structures and is improved by feature fusion of theTwo attention modules to obtain more accurate segmentation results.

Multiple-Attention Mechanism Network for Semantic Segmentation

This paper proposes a multiple-attention mechanism network (MANet) for semantic segmentation in a very effective and efficient way and achieves the mIoU scores of 75.5% and 72.8% on PASCAL VOC 2012 and Cityscapes datasets, respectively.

Feature Selective Transformer for Semantic Image Segmentation

This work focuses on fusing multi-scale features from Transformer-based backbones for semantic segmentation, and proposes a Feature Selective Transformer (FeSeFormer), which aggregates features from all scales (or levels) for each query feature.



Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation

This work proposes a data-dependent upsampling (DUpsampling) to replace bilinear, which takes advantages of the redundancy in the label space of semantic segmentation and is able to recover the pixel-wise prediction from low-resolution outputs of CNNs.

Multi-scale Context Intertwining for Semantic Segmentation

This work proposes a novel scheme for aggregating features from different scales, which it refers to as Multi-Scale Context Intertwining (MSCI), which merge pairs of feature maps in a bidirectional and recurrent fashion, via connections between two LSTM chains.

RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation

RefineNet is presented, a generic multi-path refinement network that explicitly exploits all the information available along the down-sampling process to enable high-resolution prediction using long-range residual connections and introduces chained residual pooling, which captures rich background context in an efficient manner.

Context Encoding for Semantic Segmentation

The proposed Context Encoding Module significantly improves semantic segmentation results with only marginal extra computation cost over FCN, and can improve the feature representation of relatively shallow networks for the image classification on CIFAR-10 dataset.

Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation

This work shows how to improve semantic segmentation through the use of contextual information, specifically, ' patch-patch' context between image regions, and 'patch-background' context, and formulate Conditional Random Fields with CNN-based pairwise potential functions to capture semantic correlations between neighboring patches.

Fully convolutional networks for semantic segmentation

The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.

Rethinking Atrous Convolution for Semantic Image Segmentation

The proposed `DeepLabv3' system significantly improves over the previous DeepLab versions without DenseCRF post-processing and attains comparable performance with other state-of-art models on the PASCAL VOC 2012 semantic image segmentation benchmark.

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

This work addresses the task of semantic image segmentation with Deep Learning and proposes atrous spatial pyramid pooling (ASPP), which is proposed to robustly segment objects at multiple scales, and improves the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models.

Semantic Image Segmentation via Deep Parsing Network

This paper addresses semantic image segmentation by incorporating rich information into Markov Random Field (MRF), including high-order relations and mixture of label contexts by proposing a Convolutional Neural Network (CNN), namely Deep Parsing Network (DPN), which enables deterministic end-to-end computation in a single forward pass.

Learning Deconvolution Network for Semantic Segmentation

A novel semantic segmentation algorithm by learning a deep deconvolution network on top of the convolutional layers adopted from VGG 16-layer net, which demonstrates outstanding performance in PASCAL VOC 2012 dataset.