DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation
@article{Shen2020DCTMaskDC, title={DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation}, author={Xing Shen and Jirui Yang and Chunbo Wei and Bing Deng and Jianqiang Huang and Xiansheng Hua and Xiaoliang Cheng and Kewei Liang}, journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2020}, pages={8716-8725} }
Binary grid mask representation is broadly used in instance segmentation. A representative instantiation is Mask R-CNN which predicts masks on a 28×28 binary grid. Generally, a low-resolution grid is not sufficient to capture the details, while a high-resolution grid dramatically increases the training complexity. In this paper, we propose a new mask representation by applying the discrete cosine transform(DCT) to encode the high-resolution binary grid mask into a compact vector. Our method…
Figures and Tables from this paper
22 Citations
Recurrent Contour-based Instance Segmentation with Progressive Learning
- Computer Science
- 2023
The results demonstrate that the proposed PolySnake outperforms the existing contour-based instance segmentation methods on several prevalent instance segmentations benchmarks.
Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation
- Computer ScienceArXiv
- 2023
This work devise a joint Caption Grounding and Generation (CGG) framework based on a Mask Transformer baseline that has a novel grounding loss that performs explicit and implicit multi-modal feature alignments and achieves a large improvement of 6.8% mAP on novel classes without extra caption data.
Painterly Image Harmonization in Dual Domains
- Computer Science
- 2022
A novel painterly harmonization network consist-ing of a dual-domain generator and aDual-domain discriminator, which harmonizes the composite image in both spatial domain and frequency domain, which shows the effectiveness of the method.
Improving Multiple Machine Vision Tasks in the Compressed Domain
- Computer Science2022 26th International Conference on Pattern Recognition (ICPR)
- 2022
This paper improves the machine vision tasks in the compressed domain with better rate-accuracy/distortion and lower complexity compared with the state-of-the-art pixel-domain work that can take both machine and human vision tasks.
RGB no more: Minimally-decoded JPEG Vision Transformers
- Computer ScienceArXiv
- 2022
This work focuses on training Vision Transformers (ViT) directly from the encoded features of JPEG, and tackles data augmentation directly on these encoded features, which to the knowledge, has not been explored in-depth for training in this setting.
Semantic Communication Enabling Robust Edge Intelligence for Time-Critical IoT Applications
- Computer ScienceArXiv
- 2022
The proposed Edge Intelligence framework using semantic communication for time-critical IoT applications outperforms the conventional approach under latency and data rate constraints, in particular, under ultra stringent deadlines and low data rate.
Global Spectral Filter Memory Network for Video Object Segmentation
- Computer ScienceECCV
- 2022
This paper proposes Global Spectral Filter Memory network (GSFM), which improves intra-frame interaction through learning long-term spatial dependencies in the spectral domain and proposes Low (High) Frequency Module, which is proposed to fit this circumstance.
FMNet: Frequency-Aware Modulation Network for SDR-to-HDR Translation
- Computer ScienceACM Multimedia
- 2022
A frequency-aware modulation block that can dynamically modulate the features according to its frequency-domain responses is designed to enhance the contrast in a frequency-adaptive way for SDR-to-HDR translation and reduce the structural distortions and artifacts in the translated low-frequency regions.
SATMask: Spatial Attention Transform Mask for Dense Instance Segmentation
- Computer Science2022 7th IEEE International Conference on Data Science in Cyberspace (DSC)
- 2022
An anchor- free and single shot dense image segmentation framework, named SATMask, which adds a Spatial Attention Transform (SAT) mask head on anchor-free one stage object detector (FCOS) to predict high quality instance mask with low complexity, and uses feature-aligned pyramid network to fuse the feature map generated by backbone to obtain rich spatial details and better semantic information.
MFEAFN: Multi-scale feature enhanced adaptive fusion network for image semantic segmentation
- Computer SciencePloS one
- 2022
This paper proposes a multiscale feature-enhanced adaptive fusion network named MFEAFN to improve semantic segmentation performance and designed a Double Spatial Pyramid Module named DSPM to extract more high-level semantic information.
References
SHOWING 1-10 OF 36 REFERENCES
Mask Encoding for Single Shot Instance Segmentation
- Computer Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
Instead of predicting the two-dimensional mask directly, MEInst distills it into a compact and fixed-dimensional representation vector, which allows the instance segmentation task to be incorporated into one-stage bounding-box detectors and results in a simple yet efficient instance segmentations framework.
Conditional Convolutions for Instance Segmentation
- Computer ScienceECCV
- 2020
A simpler instance segmentation method that can achieve improved performance in both accuracy and inference speed on the COCO dataset, and outperform a few recent methods including well-tuned Mask RCNN baselines, without longer training schedules needed.
Boundary-preserving Mask R-CNN
- Computer ScienceECCV
- 2020
A conceptually simple yet effective Boundary-preserving Mask R-CNN (BMask R- CNN) to leverage object boundary information to improve mask localization accuracy in instance segmentation.
BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation
- Computer Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
The proposed BlendMask can effectively predict dense per-pixel position-sensitive instance features with very few channels, and learn attention maps for each instance with merely one convolution layer, thus being fast in inference.
Mask R-CNN
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2020
This work presents a conceptually simple, flexible, and general framework for object instance segmentation that outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners.
PolarMask: Single Shot Instance Segmentation With Polar Representation
- Computer Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
In this paper, we introduce an anchor-box free and single shot instance segmentation method, which is conceptually simple, fully convolutional and can be used by easily embedding it into most…
Exploring Semantic Segmentation on the DCT Representation
- Computer ScienceMMAsia
- 2019
This paper is the first to explore semantic segmentation on the discrete cosine transform (DCT) representation defined by the JPEG standard and has an accuracy close to the RGB model at about the same network complexity.
Semantic segmentation of images exploiting DCT based features and random forest
- Computer SciencePattern Recognit.
- 2016
Image segmentation based on situational DCT descriptors
- Computer SciencePattern Recognit. Lett.
- 2002
SOLOv2: Dynamic and Fast Instance Segmentation
- Computer ScienceNeurIPS
- 2020
State-of-the-art results in object detection (from the authors' mask byproduct) and panoptic segmentation show the potential to serve as a new strong baseline for many instance-level recognition tasks besides instance segmentation.