- Full text PDF available (5)
This is the seventeenth annual review published in JAAS of the application of atomic spectroscopy to the chemical analysis of environmental samples. The review follows on directly from last year's (J. although this year it incorporates a number of changes in structure. The most obvious and perhaps the most important change is the removal of some of the… (More)
The fast Fourier transform (FFT) is a challenging algorithm to implement efficiently on a parallel computer. Recent algorithm advances have led to greatly improved FFT performance on parallel vector computers such as the CRA Y-2 and CRAY Y-MP. Variations on these techniques can be used to extend this improved performance to other parallel architectures. A… (More)
—This paper provides an introduction to programming accelerators using the PGI OpenACC implementation in Fortran and C, which is based on OpenACC API version 1.0. The paper explains the use of the data construct, and compares the use of the Parallel and Kernels construct. PGI-specific extensions and features, and compiler and runtime options, are shown.
An innovative programming environment for distributed computing has been developed for the new FPS T Series Parallel Vector Supercomputer. This programming model provides structured asynchronous communication routines, dynamic configuration of processing nodes into application topologies, and external data partitioning and distribution. The asynchronous… (More)
Today, most CPU+Accelerator systems incorporate NVIDIA GPUs. Intel Xeon Phi and the continued evolution of AMD Radeon GPUs make it likely we will soon see, and want to program, a wider variety of CPU+Accelerator systems. PGI already supports NVIDIA GPUs, and is working to add support for Xeon Phi and AMD Radeon. Here we explore the features common to all… (More)
PGI Fortran, C and C++ compilers and tools are available on most Cray XT3 and Cray XD1 systems. Optimizing performance of the AMD Opteron processors in these systems often depends on maximizing SSE vectorization, ensuring alignment of vectors, and minimizing the number of cycles the processors are stalled waiting on data from main memory. The PGI compilers… (More)
At CUG 2006, a cache oblivious implementation of a two dimensional Lagrangian hydrodynamics model of a single ideal gas material was presented. This paper presents further optimizations to this C++ application to allow packed, consecutive-element storage of vectors, some restructuring of loops containing neighborhood operations, and adding type qualifiers… (More)