Exploring Semantic Segmentation on the DCT Representation

  title={Exploring Semantic Segmentation on the DCT Representation},
  author={Shao-Yuan Lo and Hsueh-Ming Hang},
  journal={Proceedings of the ACM Multimedia Asia},
Typical convolutional networks are trained and conducted on RGB images. However, images are often compressed for memory savings and efficient transmission in real-world applications. In this paper, we explore methods for performing semantic segmentation on the discrete cosine transform (DCT) representation defined by the JPEG standard. We first rearrange the DCT coefficients to form a preferred input type, then we tailor an existing network to the DCT inputs. The proposed method has an accuracy… Expand
Task-Aware Quantization Network for JPEG Image Compression
A deep neural network is proposed to learn for JPEG image compression, which predicts image-specific optimized quantization tables fully compatible with the standard JPEG encoder and decoder, and provides the capability to learn task-specific quantization Tables in a principled way by adjusting the objective function of the network. Expand
Deep Learning Based Image Retrieval in the JPEG Compressed Domain
This work proposes a unified model for image retrieval which takes DCT coefficients as input and efficiently extracts global and local features directly in the JPEG compressed domain for accurate image retrieval. Expand
Analyzing and Mitigating JPEG Compression Defects in Deep Learning
It is shown that there is a significant penalty on common performance metrics for high compression, and several methods are tested for mitigating this penalty, including a novel method based on artifact correction which requires no labels to train. Expand
Analyzing and Mitigating Compression Defects in Deep Learning
It is shown that there is a significant penalty on common performance metrics for high compression, and several methods are tested for mitigating this penalty, including a novel method based on artifact correction which requires no labels to train. Expand
Object Detection in the DCT Domain: is Luminance the Solution?
This paper focuses on JPEG images and proposes a thorough analysis of detection architectures newly designed in regard of the peculiarities of the JPEG norm, which leads to a ×1.7 speed up in comparison with a standard RGB-based architecture. Expand
SOLQ: Segmenting Objects by Learning Queries
This paper proposes an end-to-end framework for instance segmentation that segments objects by learning unified queries, based on the recently introduced DETR, and shows that SOLQ can achieve state-of-the-art performance, surpassing most of existing approaches. Expand
DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation
This paper proposes a new mask representation by applying the discrete cosine transform (DCT) to encode the high-resolution binary grid mask into a compact vector, termed DCT-Mask, which could be easily integrated into most pixel-based instance segmentation methods. Expand
Quantization Guided JPEG Artifact Correction
This work creates a novel architecture which is parameterized by the JPEG files quantization matrix, which allows a single model to achieve state-of-the-art performance over models trained for specific quality settings. Expand


Deep feature extraction in the DCT domain
The results indicate that a DCT operation incorporated into the network after convolution+thresholding and before pooling can have certain advantages such as convergence over fewer training epochs and sparser weight matrices that are more conducive to pruning and hashing techniques. Expand
Fast object detection in compressed JPEG Images
This paper modify the well-known Single Shot multibox Detector by replacing its first layers with one convolutional layer dedicated to process the DCT inputs, and proposes a fast deep architecture for object detection in JPEG images, one of the most widespread compression format. Expand
DCT-domain Deep Convolutional Neural Networks for Multiple JPEG Compression Classification
This paper aims to address the problem of classifying images based on the number of JPEG compressions they have undergone, by utilizing deep convolutional neural networks in DCT domain by incorporating a well designed pre-processing step before feeding the image data to CNN to capture essential characteristics of compression artifacts and make the system image content independent. Expand
A multi-branch convolutional neural network for detecting double JPEG compression
This paper presents a CNN solution by using raw DCT (discrete cosine transformation) coefficients from JPEG images as input, designed to reveal whether a JPEG format image has been doubly compressed. Expand
Deep Residual Learning in the JPEG Transform Domain
  • Max Ehrlich, L. Davis
  • Computer Science, Mathematics
  • 2019 IEEE/CVF International Conference on Computer Vision (ICCV)
  • 2019
A general method of performing Residual Network inference and learning in the JPEG transform domain that allows the network to consume compressed images as input and shows that the sparsity of the JPEG format allows for faster processing of images with little to no penalty in the network accuracy. Expand
Towards Image Understanding from Deep Compression without Decoding
This study shows that accuracies comparable to networks that operate on compressed RGB images can be achieved while reducing the computational complexity up to $2\times, and finds that inference from compressed representations is particularly advantageous compared to inference from compression RGB images for aggressive compression rates. Expand
Faster Neural Networks Straight from JPEG
A simple idea is proposed and explored: train CNNs directly on the blockwise discrete cosine transform (DCT) coefficients computed and available in the middle of the JPEG codec, modified to produce DCT coefficients directly, and evaluated on ImageNet. Expand
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
This work addresses the task of semantic image segmentation with Deep Learning and proposes atrous spatial pyramid pooling (ASPP), which is proposed to robustly segment objects at multiple scales, and improves the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models. Expand
Compressed Video Action Recognition
This work proposes to train a deep network directly on the compressed video, using H.264, HEVC, etc., which has a higher information density, and found the training to be easier. Expand
Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation
A novel convolutional network named Efficient Dense modules with Asymmetric convolution (EDANet) is proposed, which employs an asymmetric Convolution structure and incorporates dilated convolution and dense connectivity to achieve high efficiency at low computational cost and model size. Expand