DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation

  title={DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation},
  author={Xing Shen and Jirui Yang and Chunbo Wei and Bing Deng and Jianqiang Huang and Xiansheng Hua and Xiaoliang Cheng and Kewei Liang},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  • Xing ShenJirui Yang K. Liang
  • Published 19 November 2020
  • Computer Science
  • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Binary grid mask representation is broadly used in instance segmentation. A representative instantiation is Mask R-CNN which predicts masks on a 28×28 binary grid. Generally, a low-resolution grid is not sufficient to capture the details, while a high-resolution grid dramatically increases the training complexity. In this paper, we propose a new mask representation by applying the discrete cosine transform(DCT) to encode the high-resolution binary grid mask into a compact vector. Our method… 

Recurrent Contour-based Instance Segmentation with Progressive Learning

The results demonstrate that the proposed PolySnake outperforms the existing contour-based instance segmentation methods on several prevalent instance segmentations benchmarks.

Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation

This work devise a joint Caption Grounding and Generation (CGG) framework based on a Mask Transformer baseline that has a novel grounding loss that performs explicit and implicit multi-modal feature alignments and achieves a large improvement of 6.8% mAP on novel classes without extra caption data.

Painterly Image Harmonization in Dual Domains

A novel painterly harmonization network consist-ing of a dual-domain generator and aDual-domain discriminator, which harmonizes the composite image in both spatial domain and frequency domain, which shows the effectiveness of the method.

Improving Multiple Machine Vision Tasks in the Compressed Domain

This paper improves the machine vision tasks in the compressed domain with better rate-accuracy/distortion and lower complexity compared with the state-of-the-art pixel-domain work that can take both machine and human vision tasks.

RGB no more: Minimally-decoded JPEG Vision Transformers

This work focuses on training Vision Transformers (ViT) directly from the encoded features of JPEG, and tackles data augmentation directly on these encoded features, which to the knowledge, has not been explored in-depth for training in this setting.

Semantic Communication Enabling Robust Edge Intelligence for Time-Critical IoT Applications

The proposed Edge Intelligence framework using semantic communication for time-critical IoT applications outperforms the conventional approach under latency and data rate constraints, in particular, under ultra stringent deadlines and low data rate.

Global Spectral Filter Memory Network for Video Object Segmentation

This paper proposes Global Spectral Filter Memory network (GSFM), which improves intra-frame interaction through learning long-term spatial dependencies in the spectral domain and proposes Low (High) Frequency Module, which is proposed to fit this circumstance.

FMNet: Frequency-Aware Modulation Network for SDR-to-HDR Translation

A frequency-aware modulation block that can dynamically modulate the features according to its frequency-domain responses is designed to enhance the contrast in a frequency-adaptive way for SDR-to-HDR translation and reduce the structural distortions and artifacts in the translated low-frequency regions.

SATMask: Spatial Attention Transform Mask for Dense Instance Segmentation

An anchor- free and single shot dense image segmentation framework, named SATMask, which adds a Spatial Attention Transform (SAT) mask head on anchor-free one stage object detector (FCOS) to predict high quality instance mask with low complexity, and uses feature-aligned pyramid network to fuse the feature map generated by backbone to obtain rich spatial details and better semantic information.

MFEAFN: Multi-scale feature enhanced adaptive fusion network for image semantic segmentation

This paper proposes a multiscale feature-enhanced adaptive fusion network named MFEAFN to improve semantic segmentation performance and designed a Double Spatial Pyramid Module named DSPM to extract more high-level semantic information.



Mask Encoding for Single Shot Instance Segmentation

Instead of predicting the two-dimensional mask directly, MEInst distills it into a compact and fixed-dimensional representation vector, which allows the instance segmentation task to be incorporated into one-stage bounding-box detectors and results in a simple yet efficient instance segmentations framework.

Conditional Convolutions for Instance Segmentation

A simpler instance segmentation method that can achieve improved performance in both accuracy and inference speed on the COCO dataset, and outperform a few recent methods including well-tuned Mask RCNN baselines, without longer training schedules needed.

Boundary-preserving Mask R-CNN

A conceptually simple yet effective Boundary-preserving Mask R-CNN (BMask R- CNN) to leverage object boundary information to improve mask localization accuracy in instance segmentation.

BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation

The proposed BlendMask can effectively predict dense per-pixel position-sensitive instance features with very few channels, and learn attention maps for each instance with merely one convolution layer, thus being fast in inference.

Mask R-CNN

This work presents a conceptually simple, flexible, and general framework for object instance segmentation that outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners.

PolarMask: Single Shot Instance Segmentation With Polar Representation

  • Enze XiePei Sun Ping Luo
  • Computer Science
    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020
In this paper, we introduce an anchor-box free and single shot instance segmentation method, which is conceptually simple, fully convolutional and can be used by easily embedding it into most

Exploring Semantic Segmentation on the DCT Representation

This paper is the first to explore semantic segmentation on the discrete cosine transform (DCT) representation defined by the JPEG standard and has an accuracy close to the RGB model at about the same network complexity.

Image segmentation based on situational DCT descriptors

  • Jie Wei
  • Computer Science
    Pattern Recognit. Lett.
  • 2002

SOLOv2: Dynamic and Fast Instance Segmentation

State-of-the-art results in object detection (from the authors' mask byproduct) and panoptic segmentation show the potential to serve as a new strong baseline for many instance-level recognition tasks besides instance segmentation.