Object Detection in the DCT Domain: is Luminance the Solution?

  title={Object Detection in the DCT Domain: is Luminance the Solution?},
  author={Benjamin Deguerre and Cl{\'e}ment Chatelain and Gilles Gasso},
  journal={2020 25th International Conference on Pattern Recognition (ICPR)},
Object detection in images has reached unprecedented performances. The state-of-the-art methods rely on deep architectures that extract salient features and predict bounding boxes enclosing the objects of interest. These methods essentially run on RGB images. However, the RGB images are often compressed by the acquisition devices for storage purpose and transfer efficiency. Hence, their decompression is required for object detectors. To gain in efficiency, this paper proposes to take advantage… Expand


Fast object detection in compressed JPEG Images
This paper modify the well-known Single Shot multibox Detector by replacing its first layers with one convolutional layer dedicated to process the DCT inputs, and proposes a fast deep architecture for object detection in JPEG images, one of the most widespread compression format. Expand
Exploring Semantic Segmentation on the DCT Representation
This paper is the first to explore semantic segmentation on the discrete cosine transform (DCT) representation defined by the JPEG standard and has an accuracy close to the RGB model at about the same network complexity. Expand
Fast Object Detection in Compressed Video
This paper proposes a fast object detection method by taking advantage of both motion vectors and residual errors that are freely available in video streams and is the first work that investigates a deep convolutional detector on compressed videos. Expand
Faster and Accurate Classification for JPEG2000 Compressed Images in Networked Applications
This work proposes to remove the computationally costly reconstruction step by training a deep CNN image classifier using the CDF 9/7 Discrete Wavelet Transformed (DWT) coefficients directly extracted from j2k-compressed images, and shows that traditional augmentation transforms such as flipping/shifting are ineffective in the DWT domain. Expand
SSD: Single Shot MultiBox Detector
The approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, which makes SSD easy to train and straightforward to integrate into systems that require a detection component. Expand
Focal Loss for Dense Object Detection
This paper proposes to address the extreme foreground-background class imbalance encountered during training of dense detectors by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples, and develops a novel Focal Loss, which focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training. Expand
Tiny SSD: A Tiny Single-Shot Detection Deep Convolutional Neural Network for Real-Time Embedded Object Detection
Tiny SSD is introduced, a single-shot detection deep convolutional neural network for real-time embedded object detection that is composed of a highly optimized, non-uniform Fire subnetwork stack and a non- uniform sub-network stack of highly optimized SSD-based auxiliary convolutionAL feature layers designed specifically to minimize model size while maintaining object detection performance. Expand
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%. Expand
Very Deep Convolutional Networks for Large-Scale Image Recognition
This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. Expand
Compressed Video Action Recognition
This work proposes to train a deep network directly on the compressed video, using H.264, HEVC, etc., which has a higher information density, and found the training to be easier. Expand