Corpus ID: 235254713

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

@article{Xie2021SegFormerSA,
  title={SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers},
  author={Enze Xie and Wenhai Wang and Zhiding Yu and Anima Anandkumar and Jos{\'e} Manuel {\'A}lvarez and Ping Luo},
  journal={ArXiv},
  year={2021},
  volume={abs/2105.15203}
}
We present SegFormer, a simple, efficient yet powerful semantic segmentation framework which unifies Transformers with lightweight multilayer perceptron (MLP) decoders. SegFormer has two appealing features: 1) SegFormer comprises a novel hierarchically structured Transformer encoder which outputs multiscale features. It does not need positional encoding, thereby avoiding the interpolation of positional codes which leads to decreased performance when the testing resolution differs from training… Expand
CycleMLP: A MLP-like Architecture for Dense Prediction
TLDR
CycleMLP aims to provide a competitive baseline on object detection, instance segmentation, and semantic segmentation for MLP models, and expands the MLPlike models’ applicability, making them a versatile backbone for dense prediction tasks. Expand
A Unified Efficient Pyramid Transformer for Semantic Segmentation
TLDR
This work advocate a unified framework (UN-EPT) to segment objects by considering both context information and boundary artifacts by adapting a sparse sampling strategy to incorporate the transformer-based attention mechanism for efficient context modeling. Expand
Trans4Trans: Efficient Transformer for Transparent Object and Semantic Scene Segmentation in Real-World Navigation Assistance
TLDR
A wearable system with a novel dualhead Transformer for Transparency (Trans4Trans) perception model, which can segment general and transparent objects and maintain efficiency on a portable GPU, demonstrating its high efficiency and robustness for real-world transportation applications. Expand
Trans4Trans: Efficient Transformer for Transparent Object Segmentation to Help Visually Impaired People Navigate in the Real World
TLDR
A wearable system with a novel dual-head Transformer for Transparency (Trans4Trans) model, capable of segmenting general and transparent objects and performing real-time wayfinding to assist people walking alone more safely. Expand
Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers
TLDR
The proposed model, named PolypPVT, effectively suppresses noises in the features and significantly improves their expressive capabilities, and achieves the new state-of-the-art performance. Expand
MISSFormer: An Effective Medical Image Segmentation Transformer
  • Xiaohong Huang, Zhifang Deng, Dandan Li, Xueguang Yuan
  • Computer Science
  • ArXiv
  • 2021
TLDR
MISSFormer is a hierarchical encoder-decoder network and has two appealing designs: a feed forward network is redesigned with the proposed Enhanced Transformer Block, which makes features aligned adaptively and enhances the long-range dependencies and local context of multi-scale features generated by the hierarchical transformer encoder. Expand
Semantic Segmentation on VSPW Dataset through Aggregation of Transformer Models
TLDR
This work performs a semantic segmentation task on VSPW dataset based on Transformers and model aggregation, which achieves 57.35% mIoU, which is ranked 3rd place in the Video Scene Parsing in the Wild Challenge. Expand
Efficient Transformer for Remote Sensing Image Segmentation
TLDR
The experimental results present that the proposed Efficient transformer will have an advantage in dealing with remote sensing image segmentation problems, achieving a trade-off between computational complexity (Flops) and accuracy (Efficient-L obtaining 3.23% mIoU improvement on Vaihingen and 2.46% on Potsdam). Expand
Hybrid Local-Global Transformer for Image Dehazing
TLDR
This paper proposes a new ViT architecture, named Hybrid Local-Global Vision Transformer (HyLoGViT), for single image dehazing, and proposes a complementary features selection module (CFSM) to select the useful ones for imageDehazing. Expand
OH-Former: Omni-Relational High-Order Transformer for Person Re-Identification
TLDR
This work proposes an OmniRelational High-Order Transformer (OH-Former) to model omni-relational features for ReID and a convolution-based local relation perception module is proposed to extract the local relations and 2D position information. Expand
...
1
2
...

References

SHOWING 1-10 OF 81 REFERENCES
Squeeze-and-Attention Networks for Semantic Segmentation
  • Zilong Zhong, Z. Lin, +4 authors A. Wong
  • Computer Science
  • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020
TLDR
A novel squeeze-and-attention network (SANet) architecture is proposed that leverages an effective squeeze- and-att attention (SA) module to account for two distinctive characteristics of segmentation: i) pixel-group attention, and ii) pixels-wise prediction. Expand
Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
TLDR
This work extends DeepLabv3 by adding a simple yet effective decoder module to refine the segmentation results especially along object boundaries and applies the depthwise separable convolution to both Atrous Spatial Pyramid Pooling and decoder modules, resulting in a faster and stronger encoder-decoder network. Expand
CGNet: A Light-Weight Context Guided Network for Semantic Segmentation
TLDR
This work proposes a novel Context Guided Network (CGNet), which is a light-weight and efficient network for semantic segmentation, and develops CGNet which captures contextual information in all stages of the network. Expand
Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells
TLDR
This work focuses on dense per-pixel tasks, in particular, semantic image segmentation using fully convolutional networks, and relies on a progressive strategy that terminates non-promising architectures from being further trained, and on Polyak averaging coupled with knowledge distillation to speed-up the convergence. Expand
Context Encoding for Semantic Segmentation
TLDR
The proposed Context Encoding Module significantly improves semantic segmentation results with only marginal extra computation cost over FCN, and can improve the feature representation of relatively shallow networks for the image classification on CIFAR-10 dataset. Expand
Dual Attention Network for Scene Segmentation
TLDR
New state-of-the-art segmentation performance on three challenging scene segmentation datasets, i.e., Cityscapes, PASCAL Context and COCO Stuff dataset is achieved without using coarse data. Expand
Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network
TLDR
This work proposes a Global Convolutional Network to address both the classification and localization issues for the semantic segmentation and suggests a residual-based boundary refinement to further refine the object boundaries. Expand
RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation
TLDR
RefineNet is presented, a generic multi-path refinement network that explicitly exploits all the information available along the down-sampling process to enable high-resolution prediction using long-range residual connections and introduces chained residual pooling, which captures rich background context in an efficient manner. Expand
Fully Convolutional Networks for Semantic Segmentation
TLDR
It is shown that convolutional networks by themselves, trained end- to-end, pixels-to-pixels, improve on the previous best result in semantic segmentation. Expand
Adaptive Pyramid Context Network for Semantic Segmentation
TLDR
This paper introduces three desirable properties of context features in segmentation task and finds that Global-guided Local Affinity (GLA) can play a vital role in constructing effective context features, while this property has been largely ignored in previous works. Expand
...
1
2
3
4
5
...