Very Efficient Training of Convolutional Neural Networks using Fast Fourier Transform and Overlap-and-Add

  title={Very Efficient Training of Convolutional Neural Networks using Fast Fourier Transform and Overlap-and-Add},
  author={Tyler Highlander and Andres Rodriguez},
Convolutional neural networks (CNNs) are currently state-of-the-art for various classification tasks, but are computationally expensive. Propagating through the convolutional layers is very slow, as each kernel in each layer must sequentially calculate many dot products for a single forward and backward propagation which equates to $\mathcal{O}(N^{2}n^{2})$ per kernel per layer where the inputs are $N \times N$ arrays and the kernels are $n \times n$ arrays. Convolution can be efficiently… 

Figures and Tables from this paper

End-to-End Training of Deep Neural Networks in the Fourier Domain
This paper presents a method for implementing neural network completely in the Fourier domain, and by this, saving multiplications and the operations of inverse Fourier transformations.
Accelerating Convolutional Neural Network With FFT on Embedded Hardware
Three variations of convolutions are evaluated, including direct convolution, fast Fourier transform-based convolution (FFT-Conv), and FFT overlap and add convolution for popular CNN networks in embedded hardware to explore the tradeoff between software and hardware implementation, domain-specific logic and instructions, as well as various parallelism across different architectures.
A Fast Algorithm for Convolutional Neural Networks Using Tile-based Fast Fourier Transforms
The tile-based decomposition strategy is introduced into Fourier transforms to yield a fast convolution algorithm, called tFFT, which reduces the arithmetic complexity of CNNs by over a factor of 3 compared to FFT-based convolution algorithms.
Acceleration of Convolutional Neural Network Using FFT-Based Split Convolutions
A new method for CNN processing in the FFT domain is proposed, which is based on input splitting, which reduces the complexity of the computations required for FFT and increases efficiency.
Fast Convolutional Neural Networks with Fine-Grained FFTs
This paper analyzes the origin of the redundancy generated by the im2col process, and reveals a new data pattern to more mathematically concisely describe the matrix representation for convolution, which is implemented as a FFT-based convolution with finer FFT granularity.
Low-memory GEMM-based convolution algorithms for deep neural networks
Two novel GEMM-based algorithms that require just a fraction of the amount of additional memory for DNN convolution, making it much more suitable for memory-limited embedded systems are presented.
Activating frequencies: Exploring non-linearities in the Fourier domain
This thesis is the first published study into activation functions in the frequency domain of CNNs, and several potential candidates for an activation function that works directly in thefrequency domain are studied.
Modeling information flow through deep convolutional neural networks
Filter responses with low conditional entropy (CENT) are shown to be highly effective in image classification, and can be used as generic features for effective, noise resistant transfer learning.
Accelerating Convolutional Neural Network Using Discrete Orthogonal Transforms
All experiments are implemented in Python, using the PyTorch and the Torch-DCT libraries under the Google Colab environment and the implementations of the Hartley and Cosine transforms, listed in Table 1, are not implemented using the same optimizations used in the FFT.


Fast Training of Convolutional Networks through FFTs
This work presents a simple algorithm which accelerates training and inference by a significant factor, and can yield improvements of over an order of magnitude compared to existing state-of-the-art implementations.
ImageNet classification with deep convolutional neural networks
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
Fast Convolutional Nets With fbfft: A GPU Performance Evaluation
We examine the performance profile of Convolutional Neural Network training on the current generation of NVIDIA Graphics Processing Units. We introduce two new Fast Fourier Transform convolution
Caffe: Convolutional Architecture for Fast Feature Embedding
Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Going deeper with convolutions
We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition
Learning Multiple Layers of Features from Tiny Images
It is shown how to train a multi-layer generative model that learns to extract meaningful features which resemble those found in the human visual cortex, using a novel parallelization algorithm to distribute the work among multiple machines connected on a network.
OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks
This integrated framework for using Convolutional Networks for classification, localization and detection is the winner of the localization task of the ImageNet Large Scale Visual Recognition Challenge 2013 and obtained very competitive results for the detection and classifications tasks.
FFTW: an adaptive software architecture for the FFT
  • Matteo FrigoSteven G. Johnson
  • Computer Science
    Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181)
  • 1998
An adaptive FFT program that tunes the computation automatically for any particular hardware, and tests show that FFTW's self-optimizing approach usually yields significantly better performance than all other publicly available software.
ImageNet: A large-scale hierarchical image database
A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.