• Corpus ID: 237091273

SOTR: Segmenting Objects with Transformers

  title={SOTR: Segmenting Objects with Transformers},
  author={Ruohao Guo and Dantong Niu and Liao Qu and Zhenbo Li},
Most recent transformer-based models show impressive performance on vision tasks, even better than Convolution Neural Networks (CNN). In this work, we present a novel, flexible, and effective transformer-based model for high-quality instance segmentation. The proposed method, Segmenting Objects with TRansformers (SOTR), simplifies the segmentation pipeline, building on an alternative CNN backbone appended with two parallel subtasks: (1) predicting per-instance category via transformer and (2… 

Figures and Tables from this paper

Mask Transfiner for High-Quality Instance Segmentation
Instead of operating on regular dense tensors, the Mask Transfiner decomposes and represents the image regions as a quadtree, which allows it to predict highly accurate instance masks, at a low computational cost.
Unsupervised Domain Adaptation for Semantic Image Segmentation: a Comprehensive Survey
This survey is an effort to summarize five years of this incredibly rapidly growing field, which embraces the importance of semantic segmentation itself and a critical need of adapting segmentation models to new environments.


BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation
The proposed BlendMask can effectively predict dense per-pixel position-sensitive instance features with very few channels, and learn attention maps for each instance with merely one convolution layer, thus being fast in inference.
Mask Encoding for Single Shot Instance Segmentation
Instead of predicting the two-dimensional mask directly, MEInst distills it into a compact and fixed-dimensional representation vector, which allows the instance segmentation task to be incorporated into one-stage bounding-box detectors and results in a simple yet efficient instance segmentations framework.
Panoptic Feature Pyramid Networks
This work endsow Mask R-CNN, a popular instance segmentation method, with a semantic segmentation branch using a shared Feature Pyramid Network (FPN) backbone, and shows it is a robust and accurate baseline for both tasks.
CenterMask: Single Shot Instance Segmentation With Point Representation
This paper decomposes the instance segmentation into two parallel subtasks: Local Shape prediction that separates instances even in overlapping conditions, and Global Saliency generation that segments the whole image in a pixel-to-pixel manner.
FCOS: Fully Convolutional One-Stage Object Detection
For the first time, a much simpler and flexible detection framework achieving improved detection accuracy is demonstrated, and it is hoped that the proposed FCOS framework can serve as a simple and strong alternative for many other instance-level tasks.
CenterMask: Real-Time Anchor-Free Instance Segmentation
We propose a simple yet efficient anchor-free instance segmentation, called CenterMask, that adds a novel spatial attention-guided mask (SAG-Mask) branch to anchor-free one stage object detector
PolarMask: Single Shot Instance Segmentation With Polar Representation
  • Enze Xie, Pei Sun, +5 authors Ping Luo
  • Computer Science
    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020
In this paper, we introduce an anchor-box free and single shot instance segmentation method, which is conceptually simple, fully convolutional and can be used by easily embedding it into most
Instance Segmentation by Jointly Optimizing Spatial Embeddings and Clustering Bandwidth
This work proposes a new clustering loss function for proposal-free instance segmentation that pulls the spatial embeddings of pixels belonging to the same instance together and jointly learns an instance-specific clustering bandwidth, maximizing the intersection-over-union of the resulting instance mask.
Axial Attention in Multidimensional Transformers
Axial Transformers is proposed, a self-attention-based autoregressive model for images and other data organized as high dimensional tensors that maintains both full expressiveness over joint distributions over data and ease of implementation with standard deep learning frameworks, while requiring reasonable memory and computation.
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.