Hybrid and 4-D FFT implementations of an open-source parallel FFT package OpenFFT
The fast Fourier transform (FFT) is a primitive kernel in numerous fields of science and engineering. OpenFFT is an open-source parallel package for 3-D FFTs, built on a communication-optimal domain decomposition method for achieving minimal volume of communication. In this paper, we analyze, model, and tune the performance of OpenFFT, paying a particular attention to tuning of communication that dominates the run time of large-scale calculations. We first analyze its performance on different machines for a thorough understanding of the behaviors of the package and machines. We then build a performance model of OpenFFT on the machines, dividing it into computation and communication with a modeling of network overhead. Based on the performance analysis, we develop six communication methods for performing communication with the aim of covering varied calculation scales on a wide variety of computational platforms. OpenFFT is therefore augmented with an auto-tuning of communication to select the best method in run time depending on their performance. Numerical results demonstrate that the optimized OpenFFT is able to deliver good performance in comparison with other state-of-the-art packages at different computational scales on a number of parallel machines. The performance model is also useful for performance predictions and understanding.