Douglas Miles

Learn More
The fast Fourier transform (FFT) is a challenging algorithm to implement efficiently on a parallel computer. Recent algorithm advances have led to greatly improved FFT performance on parallel vector computers such as the CRA Y-2 and CRAY Y-MP. Variations on these techniques can be used to extend this improved performance to other parallel architectures. A(More)
PGI Fortran, C and C++ compilers and tools are available on most Cray XT3 and Cray XD1 systems. Optimizing performance of the AMD Opteron processors in these systems often depends on maximizing SSE vectorization, ensuring alignment of vectors, and minimizing the number of cycles the processors are stalled waiting on data from main memory. The PGI compilers(More)
Today, most CPU+Accelerator systems incorporate NVIDIA GPUs. Intel Xeon Phi and the continued evolution of AMD Radeon GPUs make it likely we will soon see, and want to program, a wider variety of CPU+Accelerator systems. PGI already supports NVIDIA GPUs, and is working to add support for Xeon Phi and AMD Radeon. Here we explore the features common to all(More)
  • 1