Effectiveness of partitioning strategies of Fast Fourier Transform in GPU implementations


In this paper authors present the results of the effectiveness comparison between the variants of the Radix-2 Deci-mation in Time (DIT) Fast Fourier Transform (FFT) algorithm's implementations on graphics processing units (GPUs) which differ in the way the calculations are distributed among GPUs computational resources. The conducted experiments show that the partitioning of the FFT computational scheme into 4-point and 8-point subtransform blocks in which calculations are performed in a sequential, single instruction multiple thread (SIMT) manner results in significantly faster execution of the FFT calculations on the selected GPU architectures than the standard parallel 2-point butterfly base operation approach. Moreover the proposed partitioning scheme can be extended to arbitrary subtransform block size for a given N-point Radix-2 FFT implementation dedicated to particular GPU architectures.

Cite this paper

@article{Wieloch2017EffectivenessOP, title={Effectiveness of partitioning strategies of Fast Fourier Transform in GPU implementations}, author={Kamil Wieloch and Kamil Stokfiszewski and Mykhaylo Yatsymirskyy}, journal={2017 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT)}, year={2017}, volume={1}, pages={322-325} }