Learning in the Frequency Domain

  title={Learning in the Frequency Domain},
  author={Kai Xu and Minghai Qin and Fei Sun and Yuhao Wang and Yen-kuang Chen and Fengbo Ren},
  journal={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  • Kai XuMinghai Qin Fengbo Ren
  • Published 27 February 2020
  • Computer Science
  • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Deep neural networks have achieved remarkable success in computer vision tasks. Existing neural networks mainly operate in the spatial domain with fixed input sizes. For practical applications, images are usually large and have to be downsampled to the predetermined input size of neural networks. Even though the downsampling operations reduce computation and the required communication bandwidth, it removes both redundant and salient information obliviously, which results in accuracy degradation… 

Figures and Tables from this paper

Towards Robust 2D Convolution for Reliable Visual Recognition

This paper designs a novel building block, denoted by RConv-MK, to strengthen the robustness of extracted convolutional features, which leverages a set of learnable kernels of different sizes to extract features at different frequencies, and employs a normalized soft thresholding operator to adaptively remove noises and trivial features atDifferent corruption levels.

Deep Learning Based Image Retrieval in the JPEG Compressed Domain

This work proposes a unified model for image retrieval which takes DCT coefficients as input and efficiently extracts global and local features directly in the JPEG compressed domain for accurate image retrieval.

Frequency-domain Learning for Volumetric-based 3D Data Perception

This study finds that 3D CNNs are sensitive to a limited number of critical frequency channels, especially low-frequency channels, and frequency-domain learning can significantly reduce the size of volumetric-based 3D inputs (based on spectral bias) while achieving comparable accuracy with conventional spatial- domain learning approaches.

Improving Multiple Machine Vision Tasks in the Compressed Domain

This paper improves the machine vision tasks in the compressed domain with better rate-accuracy/distortion and lower complexity compared with the state-of-the-art pixel-domain work that can take both machine and human vision tasks.

Few-Shot Learning for Plant-Disease Recognition in the Frequency Domain

This work introduces frequency representation into the FSL paradigm for plant-disease recognition, and shows that the performance is much better in the frequency domain than in the spatial domain, and the Gaussian-like calibrator further improves the performance.

Privacy-Preserving Face Recognition in the Frequency Domain

Results show that the proposed scheme achieves a recognition performance and inference time comparable to ArcFace operating on original face images directly, and a fast masking method is proposed that is validated over several large face datasets.

Pure Frequency-Domain Deep Neural Network for IoT-Enabled Smart Cameras

This study is the first to realize an FD fully connected layer, which can better represent a spectral feature distribution and improve frames per second and memory usage, and save approximately 26.09% of power consumption for the MNIST data set.

Faster-FCoViAR: Faster Frequency-Domain Compressed Video Action Recognition

A novel faster frequency- domain compressed video action recognition framework (termed Faster-FCoViAR), which consists of a frequency-domain partial decompression method (FPDec), a Frequency-domain channel selection strategy (FCS), and a spatialto-frequency domain student-teacher network (S2FNet).

Medical Frequency Domain Learning: Consider Inter-class and Intra-class Frequency for Medical Image Segmentation and Classification

A method of learning in the frequency domain to train CNNs called Frequency domain attention (FDAM) Workflow, which only requires little parameters rise and modification in CNNs to improve accuracy and reduce computation.

Signal Processing Transformations in Scene Recognition from Satellite Imagery

We propose that traditional signal processing transformations, namely the Fourier Transform and Wavelet Transform, extract meaningful information from visual data for the purpose of image



Mask R-CNN

This work presents a conceptually simple, flexible, and general framework for object instance segmentation that outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners.

Effects of Image Degradation and Degradation Removal to CNN-Based Image Classification

Whether image classification performance drops with each kind of degradation, whether this drop can be avoided by including degraded images into training, and whether existing computer vision algorithms that attempt to remove such degradations can help improve the image classificationperformance are studied.

Dynamic Recursive Neural Network

It is demonstrated that the DRNN can achieve better performance with fewer blocks by employing block recursively, and reduces the parameters and computational cost and while outperforms the original model in term of the accuracy consistently on CIFAR-10 and ImageNet-1k.

Importance Estimation for Neural Network Pruning

A novel method that estimates the contribution of a neuron (filter) to the final loss and iteratively removes those with smaller scores and two variations of this method using the first and second-order Taylor expansions to approximate a filter's contribution are described.

MMDetection: Open MMLab Detection Toolbox and Benchmark

This paper presents MMDetection, an object detection toolbox that contains a rich set of object detection and instance segmentation methods as well as related components and modules, and conducts a benchmarking study on different methods, components, and their hyper-parameters.

Band-limited Training and Inference for Convolutional Neural Networks

The convolutional layers are core building blocks of neural network architectures. In general, a convolutional filter applies to the entire frequency spectrum of the input data. We explore

Overcoming Data Transfer Bottlenecks in DNN Accelerators via Layer-Conscious Memory Managment

A layer-conscious memory hierarchy (LCMH) methodology for DNN accelerators that could achieve up to 36% speedup compared with the designs wtih UMH and 5% improvement over state-of-the-art designs.

Deep Residual Learning in the JPEG Transform Domain

  • Max EhrlichL. Davis
  • Computer Science
    2019 IEEE/CVF International Conference on Computer Vision (ICCV)
  • 2019
A general method of performing Residual Network inference and learning in the JPEG transform domain that allows the network to consume compressed images as input and shows that the sparsity of the JPEG format allows for faster processing of images with little to no penalty in the network accuracy.

HAQ: Hardware-Aware Automated Quantization With Mixed Precision

The Hardware-Aware Automated Quantization (HAQ) framework is introduced which leverages the reinforcement learning to automatically determine the quantization policy, and takes the hardware accelerator's feedback in the design loop to generate direct feedback signals to the RL agent.

You Look Twice: GaterNet for Dynamic Filter Selection in CNNs

This paper investigates input-dependent dynamic filter selection in deep convolutional neural networks (CNNs) and proposes a novel yet simple framework called GaterNet, which involves a backbone and a gater network.