# The Design and Implementation of FFTW3

@article{Frigo2005TheDA, title={The Design and Implementation of FFTW3}, author={Matteo Frigo and Steven G. Johnson}, journal={Proceedings of the IEEE}, year={2005}, volume={93}, pages={216-231} }

FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with hand-optimized libraries, and describes the software structure that makes our current FFTW3 version flexible and adaptive. We further discuss a new algorithm for real-data DFTs of prime size, a new way of implementing DFTs by means of machine-specific single-instruction…

## 4,561 Citations

Fast Fourier Transform in Large-Scale Systems

- Computer ScienceThe Art of High Performance Computing for Computational Science, Vol. 1
- 2019

This chapter presents an introduction to the basis of the FFT and its implementation in parallel computing, and provides up-to-date computational techniques relevant to the F FT in state-of-the-art processors.

The Fastest Fourier Transform in the South

- Computer ScienceIEEE Transactions on Signal Processing
- 2013

FFTS is a discrete Fourier transform library that achieves state-of-the-art performance using a new cache-oblivious algorithm implemented with run-time specialization, and is, in almost all cases, faster than self-tuning libraries such as FFTW, and even vendor-tuned librariessuch as Intel IPP and Apple vDSP.

FFT Implementation on a Streaming Architecture

- Computer Science2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing
- 2011

This paper proposes an efficient implementation of the FFT with AMD's Brook+ language, describing several features and optimization strategies, and analyzing the scalability and performance compared to other well-known existing solutions.

High performance implementation of the inverse TFT

- Computer SciencePASCO
- 2015

A high performance implementation of the inverse truncated Fourier transform is reported which poses additional challenges compared to that of the forward transform and provides significant performance improvement over zero-padding approaches even when high-performance FFT libraries are used.

Generating symmetric DFTs and equivariant FFT algorithms

- Computer ScienceISSAC '07
- 2007

This paper presents a code generator which produces efficient implementations of multi-dimensional fast Fourier transform (FFT) algorithms which utilize symmetries in the input data to reduce memory…

Automatic Tuning for Parallel FFTs

- Computer ScienceSoftware Automatic Tuning, From Concepts to State-of-the-Art Results
- 2010

The performance results demonstrate that the proposed implementation of parallel FFTs with automatic performance tuning is efficient for improving the performance.

IMPLEMENTATION OF FFT ALGORITHM

- Computer Science
- 2017

An efficient algorithm to compute 8 point FFT has been devised in which a butterfly unit computes the output and then feeds those outputs as inputs to the next butterfly units so as to compute the overall FFT.

A Fast Algorithm With Less Operations for Length- DFTs

- Computer Science
- 2015

A fast Fourier transform (FFT) algorithm for computing length- DFTs that achieves reduction of arithmetic complexity over the related algorithms.

FFTSS: A High Performance Fast Fourier Transform Library

- Computer Science2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings
- 2006

A new fast Fourier transform library is introduced which provides the source code which compilers can optimize easily to achieve high performance on various processors.

An Implementation of Parallel 1-D FFT on the K Computer

- Computer Science2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems
- 2012

The proposed implementation of a parallel one-dimensional fast Fourier transform (FFT) on the K computer is based on the six-step FFT algorithm, which can be altered into the recursive six- step F FT algorithm to reduce the number of cache misses.

## References

SHOWING 1-10 OF 82 REFERENCES

FFTW: an adaptive software architecture for the FFT

- Computer ScienceProceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181)
- 1998

An adaptive FFT program that tunes the computation automatically for any particular hardware, and tests show that FFTW's self-optimizing approach usually yields significantly better performance than all other publicly available software.

A fast Fourier transform compiler

- Computer SciencePLDI '99
- 1999

The internals of this special-purpose compiler, called genfft, are described in some detail, and it is argued that a specialized compiler is a valuable tool.

A Comprehensive DFT API for Scientific Computing

- Computer ScienceThe Architecture of Scientific Software
- 2000

This paper forms an API for DFT computation that encompasses all the functionality that are offered by a number of popular packages combined, allows easy porting from existing codes, and exhibits a systematic naming convention with relatively short calling sequences.

Architecture independent short vector FFTs

- Computer Science2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221)
- 2001

This paper introduces an SIMD vectorization for FFTW-the "fastest Fourier transform in the west" proposed by Frigo and Johnson (see Proceedings of the ACM SIGPLAN '99 , p.169-180, 1999). The new…

Real-valued fast Fourier transform algorithms

- Computer ScienceIEEE Trans. Acoust. Speech Signal Process.
- 1987

A new implementation of the real-valued split-radix FFT is presented, an algorithm that uses fewer operations than any otherreal-valued power-of-2-length FFT.

On computing the split-radix FFT

- EngineeringIEEE Trans. Acoust. Speech Signal Process.
- 1986

This paper presents an efficient Fortran program that computes the Duhamel-Hollmann split-radix FFT, which seems to require the least total arithmetic of any power-of-two DFT algorithm.

Self-Sorting In-Place Fast Fourier Transforms

- Computer ScienceSIAM J. Sci. Comput.
- 1991

It is shown how the familiar radix-2 Fast Fourier Transform algorithm can be extended toradix-3,Radix-4, radIX-5, and finally to mixed-radix FFTs, and how these new versions of the FFT require neither an unscrambling step nor work space.

A linear filtering approach to the computation of discrete Fourier transform

- Computer Science
- 1970

It is shown that the discrete equivalent of a chirp filter is needed to implement the computation of the discrete Fourier transform (DFT) as a linear filtering process, and that use of the conventional FFT permits the computations in a time proportional to N \log_{2} N for any N.