Learning in the Frequency Domain

@article{Xu2020LearningIT,
  title={Learning in the Frequency Domain},
  author={Kai Xu and Minghai Qin and Fei Sun and Yuhao Wang and Yen-kuang Chen and Fengbo Ren},
  journal={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2020},
  pages={1737-1746}
}
  • Kai XuMinghai Qin Fengbo Ren
  • Published 27 February 2020
  • Computer Science
  • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Deep neural networks have achieved remarkable success in computer vision tasks. Existing neural networks mainly operate in the spatial domain with fixed input sizes. For practical applications, images are usually large and have to be downsampled to the predetermined input size of neural networks. Even though the downsampling operations reduce computation and the required communication bandwidth, it removes both redundant and salient information obliviously, which results in accuracy degradation… 

Figures and Tables from this paper

Improving Multiple Machine Vision Tasks in the Compressed Domain

This paper improves the machine vision tasks in the compressed domain with better rate-accuracy/distortion and lower complexity compared with the state-of-the-art pixel-domain work that can take both machine and human vision tasks.

Few-Shot Learning for Plant-Disease Recognition in the Frequency Domain

This work introduces frequency representation into the FSL paradigm for plant-disease recognition, and shows that the performance is much better in the frequency domain than in the spatial domain, and the Gaussian-like calibrator further improves the performance.

Privacy-Preserving Face Recognition in the Frequency Domain

Results show that the proposed scheme achieves a recognition performance and inference time comparable to ArcFace operating on original face images directly, and a fast masking method is proposed that is validated over several large face datasets.

Pure Frequency-Domain Deep Neural Network for IoT-Enabled Smart Cameras

This study is the first to realize an FD fully connected layer, which can better represent a spectral feature distribution and improve frames per second and memory usage, and save approximately 26.09% of power consumption for the MNIST data set.

Medical Frequency Domain Learning: Consider Inter-class and Intra-class Frequency for Medical Image Segmentation and Classification

A method of learning in the frequency domain to train CNNs called Frequency domain attention (FDAM) Workflow, which only requires little parameters rise and modification in CNNs to improve accuracy and reduce computation.

RGB no more: Minimally-decoded JPEG Vision Transformers

This work focuses on training Vision Transformers (ViT) directly from the encoded features of JPEG, and tackles data augmentation directly on these encoded features, which to the knowledge, has not been explored in-depth for training in this setting.

FD-CNN: A Frequency-Domain FPGA Acceleration Scheme for CNN-based Image Processing Applications

FD-CNN, a novel CNN accelerator leveraging the partial decoding technique to accelerate CNNs directly in the frequency domain, is proposed and an image decoding aware design-space exploration (DSE) workflow to optimize the pipeline is proposed.

Boosting Night-time Scene Parsing with Learnable Frequency

This paper proposes to exploit the image frequency distributions for night-time scene parsing by proposing a Learnable Frequency Encoder (LFE) to model the relationship between different frequency coefficients and a Spatial Frequency Fusion module (SFF) that fuses both spatial and frequency information to guide the extraction of spatial context features.

DuetFace: Collaborative Privacy-Preserving Face Recognition via Channel Splitting in the Frequency Domain

Results show that the proposed DuetFace achieves a comparable recognition accuracy and computation cost to the unprotected ArcFace and outperforms the state-of-the-art privacy-preserving methods.

Detecting Camouflaged Object in Frequency Domain

The goal of COD task is not just to mimic the human visual ability in a single RGB domain, but to go beyond the human biological vision, and the proposed method significantly outperforms other state-of-the-art methods by a large margin.
...