Scaling Parallel 3-D FFT with Non-Blocking MPI Collectives

  title={Scaling Parallel 3-D FFT with Non-Blocking MPI Collectives},
  author={Sukhyun Song and Jeffrey K. Hollingsworth},
  journal={2014 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems},
This paper describes a new method for scalable high-performance parallel 3-D FFT. We use a 2-D decomposition of 3-D arrays to increase scaling to a large number of cores. In order to achieve high performance, we use non-blocking MPI all-to-all operations and exploit computation-communication overlap. We also auto-tune our 3-D FFT code efficiently in a large parameter space and cope with the complex trade-off in optimizing our code in various system environments. According to experimental… CONTINUE READING


Publications citing this paper.


Publications referenced by this paper.
Showing 1-4 of 4 references

The Design and Implementation of FFTW3

Proceedings of the IEEE • 2005
View 5 Excerpts
Highly Influenced

An implementation of parallel 3-d fft with 2-d decomposition on a massively parallel cluster of multi-core processors

D. Takahashi
Parallel Processing and Applied Mathematics, ser. Lecture Notes in Computer Science, vol. 6067. Springer Berlin Heidelberg, 2010. 8 • 2010
View 4 Excerpts
Highly Influenced

Scaling communication-intensive applications on BlueGene/P using one-sided communication and overlap

2009 IEEE International Symposium on Parallel & Distributed Processing • 2009
View 5 Excerpts
Highly Influenced

Similar Papers

Loading similar papers…