A Holistically-Guided Decoder for Deep Representation Learning with Applications to Semantic Segmentation and Object Detection

@article{Liu2021AHD,
  title={A Holistically-Guided Decoder for Deep Representation Learning with Applications to Semantic Segmentation and Object Detection},
  author={Jianbo Liu and Sijie Ren and Yuanjie Zheng and Xiaogang Wang and Hongsheng Li},
  journal={IEEE transactions on pattern analysis and machine intelligence},
  year={2021},
  volume={PP}
}
Both high-level and high-resolution feature representations are of great importance in various visual understanding tasks. In this paper, we propose one novel holistically-guided decoder which is introduced to obtain the high-resolution semantic-rich feature maps via the multi-scale features from the encoder. The decoding is achieved via novel holistic codeword generation and codeword assembly operations, which take advantages of both the high-level and low-level features from the encoder… 
1 Citations
A Data-scalable Transformer for Medical Image Segmentation: Architecture, Model Efficiency, and Benchmark
TLDR
This work presents UTNetV2 as a data-scalable Transformer towards generalizable medical image segmentation, and makes the data processing, models and evaluation pipeline publicly available, offering solid baselines and unbiased comparisons for promoting a wide range of downstream clinical applications.

References

SHOWING 1-10 OF 48 REFERENCES
Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation
TLDR
This work proposes a data-dependent upsampling (DUpsampling) to replace bilinear, which takes advantages of the redundancy in the label space of semantic segmentation and is able to recover the pixel-wise prediction from low-resolution outputs of CNNs.
Context Encoding for Semantic Segmentation
TLDR
The proposed Context Encoding Module significantly improves semantic segmentation results with only marginal extra computation cost over FCN, and can improve the feature representation of relatively shallow networks for the image classification on CIFAR-10 dataset.
Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation
TLDR
This work shows how to improve semantic segmentation through the use of contextual information, specifically, ' patch-patch' context between image regions, and 'patch-background' context, and formulate Conditional Random Fields with CNN-based pairwise potential functions to capture semantic correlations between neighboring patches.
RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation
TLDR
RefineNet is presented, a generic multi-path refinement network that explicitly exploits all the information available along the down-sampling process to enable high-resolution prediction using long-range residual connections and introduces chained residual pooling, which captures rich background context in an efficient manner.
Multi-scale Context Intertwining for Semantic Segmentation
TLDR
This work proposes a novel scheme for aggregating features from different scales, which it refers to as Multi-Scale Context Intertwining (MSCI), which merge pairs of feature maps in a bidirectional and recurrent fashion, via connections between two LSTM chains.
Feature Pyramid Networks for Object Detection
TLDR
This paper exploits the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost and achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles.
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
TLDR
Quantitative assessments show that SegNet provides good performance with competitive inference time and most efficient inference memory-wise as compared to other architectures, including FCN and DeconvNet.
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
TLDR
This work addresses the task of semantic image segmentation with Deep Learning and proposes atrous spatial pyramid pooling (ASPP), which is proposed to robustly segment objects at multiple scales, and improves the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models.
Dual Attention Network for Scene Segmentation
TLDR
New state-of-the-art segmentation performance on three challenging scene segmentation datasets, i.e., Cityscapes, PASCAL Context and COCO Stuff dataset is achieved without using coarse data.
FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation
TLDR
This work proposes a novel joint upsampling module named Joint Pyramid Upsampling (JPU), which achieves the state-of-the-art performance in Pascal Context dataset and ADE20K dataset while running 3 times faster.
...
...