cuFINUFFT: a load-balanced GPU library for general-purpose nonuniform FFTs

  title={cuFINUFFT: a load-balanced GPU library for general-purpose nonuniform FFTs},
  author={Yu-hsuan Shih and Garrett Wright and Joakim And'en and Johannes Blaschke and Alex H. Barnett},
  journal={2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)},
Nonuniform fast Fourier transforms dominate the computational cost in many applications including image reconstruction and signal processing. We thus present a general-purpose GPU-based CUDA library for type 1 (nonuniform to uniform) and type 2 (uniform to nonuniform) transforms in dimensions 2 and 3, in single or double precision. It achieves high performance for a given user-requested accuracy, regardless of the distribution of nonuniform points, via cache-aware point reordering, and load… 

Figures and Tables from this paper

Training Adaptive Reconstruction Networks for Inverse Problems
Neural networks are full of promises for the resolution of ill-posed inverse problems. In particular, physics informed learning approaches already seem to progressively gradually replace carefully
Scaling and Acceleration of Three-dimensional Structure Determination for Single-Particle Imaging Experiments with SpiniFEL
It is described here how the mathematical framework for parallelizable implementation and accelerated the most compute intensive parts of the application, SpiniFEL were reformulated and compared to the existing MPI+GPU implementation.


Implementing Fast MRI Gridding on GPUs via CUDA
This work implemented the non-equispaced Fast-Fourier Transform algorithm, commonly known as ‘gridding’, on a Geforce 8800 GPU using Nvidia’s CUDA framework and found that optimizations in thread scheduling, data structures, and memory access patterns could accelerate a naïve GPU implementation by over 400%.
A parallel non-uniform fast Fourier transform library based on an "exponential of semicircle" kernel
FINUFFT is presented, an efficient parallel library for type 1 (nonuiform to uniform), type 2 (uniform to nonuniform), or type 3 (non uniform toNonuniform) transforms, in dimensions 1, 2, or 3, which uses minimal RAM, requires no precomputation or plan steps, and has a simple interface to several languages.
gpuNUFFT - An Open Source GPU Library for 3D Regridding with Direct Matlab Interface
The goal of this work is to introduce gpuNUFFT, a new open-source 3d regridding GPU library with a built-in Matlab interface that is straightforward to include in all implementations of iterative image reconstruction.
The nonequispaced FFT on graphics processing units
This work parallelized the nonequispaced FFT using the CUDA FFT library and a dedicated parallelization of the approximation scheme.
A GPU acceleration of 3-D Fourier reconstruction in cryo-EM
A novel graphics processing unit (GPU)-friendly algorithm for direct Fourier reconstruction, one of the main computational bottlenecks in the 3-D volume reconstruction pipeline for some experimental cases (particularly those with a large number of images and a high internal symmetry).
Accelerating the Nonuniform Fast Fourier Transform
This paper observes that one of the standard interpolation or "gridding" schemes, based on Gaussians, can be accelerated by a significant factor without precomputation and storage of the interpolation weights, of particular value in two- and three- dimensional settings.
Python Non-Uniform Fast Fourier Transform (PyNUFFT): An Accelerated Non-Cartesian MRI Package on a Heterogeneous Platform (CPU/GPU)
A Python non-uniform fast Fourier transform (PyNUFFT) package has been developed to accelerate multidimensional non-Cartesian image reconstruction on heterogeneous platforms and provides several solvers, including the conjugate gradient method, l1 total variation regularized ordinary least square (L1TV-OLS), and l1total variation regularization least absolute deviation (L 1TV-LAD).
The type 3 nonuniform FFT and its applications June -
The nonequispaced or nonuniform fast Fourier transform (NUFFT) arises in a variety of application areas, including imaging processing and the numerical solution of partial differential equations. In