# Very Efficient Training of Convolutional Neural Networks using Fast Fourier Transform and Overlap-and-Add

@inproceedings{Highlander2015VeryET, title={Very Efficient Training of Convolutional Neural Networks using Fast Fourier Transform and Overlap-and-Add}, author={Tyler Highlander and Andres Rodriguez}, booktitle={BMVC}, year={2015} }

Convolutional neural networks (CNNs) are currently state-of-the-art for various classification tasks, but are computationally expensive. Propagating through the convolutional layers is very slow, as each kernel in each layer must sequentially calculate many dot products for a single forward and backward propagation which equates to $\mathcal{O}(N^{2}n^{2})$ per kernel per layer where the inputs are $N \times N$ arrays and the kernels are $n \times n$ arrays. Convolution can be efficiently…

## Figures and Tables from this paper

## 40 Citations

End-to-End Training of Deep Neural Networks in the Fourier Domain

- Computer ScienceMathematics
- 2022

This paper presents a method for implementing neural network completely in the Fourier domain, and by this, saving multiplications and the operations of inverse Fourier transformations.

Accelerating Convolutional Neural Network With FFT on Embedded Hardware

- Computer ScienceIEEE Transactions on Very Large Scale Integration (VLSI) Systems
- 2018

Three variations of convolutions are evaluated, including direct convolution, fast Fourier transform-based convolution (FFT-Conv), and FFT overlap and add convolution for popular CNN networks in embedded hardware to explore the tradeoff between software and hardware implementation, domain-specific logic and instructions, as well as various parallelism across different architectures.

A Fast Algorithm for Convolutional Neural Networks Using Tile-based Fast Fourier Transforms

- Computer ScienceNeural Processing Letters
- 2019

The tile-based decomposition strategy is introduced into Fourier transforms to yield a fast convolution algorithm, called tFFT, which reduces the arithmetic complexity of CNNs by over a factor of 3 compared to FFT-based convolution algorithms.

Acceleration of Convolutional Neural Network Using FFT-Based Split Convolutions

- Computer ScienceArXiv
- 2020

A new method for CNN processing in the FFT domain is proposed, which is based on input splitting, which reduces the complexity of the computations required for FFT and increases efficiency.

Fast Convolutional Neural Networks with Fine-Grained FFTs

- Computer SciencePACT
- 2020

This paper analyzes the origin of the redundancy generated by the im2col process, and reveals a new data pattern to more mathematically concisely describe the matrix representation for convolution, which is implemented as a FFT-based convolution with finer FFT granularity.

Low-memory GEMM-based convolution algorithms for deep neural networks

- Computer ScienceArXiv
- 2017

Two novel GEMM-based algorithms that require just a fraction of the amount of additional memory for DNN convolution, making it much more suitable for memory-limited embedded systems are presented.

Spectral-based convolutional neural network without multiple spatial-frequency domain switchings

- Computer ScienceNeurocomputing
- 2019

Activating frequencies: Exploring non-linearities in the Fourier domain

- Computer Science
- 2018

This thesis is the first published study into activation functions in the frequency domain of CNNs, and several potential candidates for an activation function that works directly in thefrequency domain are studied.

Modeling information flow through deep convolutional neural networks

- Computer Science
- 2020

Filter responses with low conditional entropy (CENT) are shown to be highly effective in image classification, and can be used as generic features for effective, noise resistant transfer learning.

Accelerating Convolutional Neural Network Using Discrete Orthogonal Transforms

- Computer Science
- 2021

All experiments are implemented in Python, using the PyTorch and the Torch-DCT libraries under the Google Colab environment and the implementations of the Hartley and Cosine transforms, listed in Table 1, are not implemented using the same optimizations used in the FFT.

## References

SHOWING 1-10 OF 16 REFERENCES

Fast Training of Convolutional Networks through FFTs

- Computer ScienceICLR
- 2014

This work presents a simple algorithm which accelerates training and inference by a significant factor, and can yield improvements of over an order of magnitude compared to existing state-of-the-art implementations.

ImageNet classification with deep convolutional neural networks

- Computer ScienceCommun. ACM
- 2012

A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

Fast Convolutional Nets With fbfft: A GPU Performance Evaluation

- Computer ScienceICLR
- 2015

We examine the performance profile of Convolutional Neural Network training on the current generation of NVIDIA Graphics Processing Units. We introduce two new Fast Fourier Transform convolution…

Caffe: Convolutional Architecture for Fast Feature Embedding

- Computer ScienceACM Multimedia
- 2014

Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.

Going deeper with convolutions

- Computer Science2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2015

We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition…

Learning Multiple Layers of Features from Tiny Images

- Computer Science
- 2009

It is shown how to train a multi-layer generative model that learns to extract meaningful features which resemble those found in the human visual cortex, using a novel parallelization algorithm to distribute the work among multiple machines connected on a network.

OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks

- Computer ScienceICLR
- 2014

This integrated framework for using Convolutional Networks for classification, localization and detection is the winner of the localization task of the ImageNet Large Scale Visual Recognition Challenge 2013 and obtained very competitive results for the detection and classifications tasks.

FFTW: an adaptive software architecture for the FFT

- Computer ScienceProceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181)
- 1998

An adaptive FFT program that tunes the computation automatically for any particular hardware, and tests show that FFTW's self-optimizing approach usually yields significantly better performance than all other publicly available software.

ImageNet: A large-scale hierarchical image database

- Computer Science2009 IEEE Conference on Computer Vision and Pattern Recognition
- 2009

A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.