Learn More
— In 2011, Oak Ridge National Laboratory began an upgrade to Jaguar to convert it from a Cray XT5 to a Cray XK6 system named Titan. This is being accomplished in two phases. The first phase, completed in early 2012, replaced all of the XT5 compute blades with XK6 compute blades, and replaced the SeaStar interconnect with Cray's new Gemini network. Each(More)
OpenMP has gained wide popularity as an API for parallel programming on shared memory and distributed shared memory platforms. Despite its broad availability, there remains a need for a portable, robust, open source, optimizing OpenMP compiler for C/C++/Fortran 90, especially for teaching and research, e.g. into its use on new target architectures, such as(More)
The OpenUH compiler is a branch of the open source Open64 compiler suite for C, C++, and Fortran 95/2003, with support for a variety of targets including x86_64, IA-64, and IA-32. For the past several years, we have used OpenUH to conduct research in parallel programming models and their implementation, static and dynamic analysis of parallel applications,(More)
We describe using OpenMP to compute δ-hyperbolicity, a quantity of interest in social and information network analysis, at a scale that uses up to 1000 threads. By considering both OpenMP workshare and tasking models to parallelize the computations, we find that multiple task levels permits finer grained tasks at runtime and results in better performance at(More)
New parallel computers are emerging, but developing efficient scientific code for them remains difficult. A scientist must manage not only the science-domain complexity but also the performance-optimization complexity. HERCULES is a code transformation system designed to help the scientist to separate the two concerns, which improves code maintenance, and(More)
Two-photon excitation with temporally focused pulses can be combined with phase-modulation approaches, such as computer-generated holography and generalized phase contrast, to efficiently distribute light into two-dimensional, axially confined, user-defined shapes. Adding lens-phase modulations to 2D-phase holograms enables remote axial pattern displacement(More)
A so-called SPMD style OpenMP program can achieve scalability on ccNUMA systems by means of array privatization, and earlier research has shown good performance under this approach. Since it is hard to write SPMD OpenMP code, we showed a strategy for the automatic translation of many OpenMP constructs into SPMD style in our previous work. In this paper, we(More)
OpenMP relies heavily on barrier synchronization to coordinate the work of threads that are performing the computations in a parallel region. A good implementation of barriers is thus an important part of any implementation of this API. As the number of cores in shared and distributed shared memory machines continues to grow, the quality of the barrier(More)
—Automating the process of parallel performance experimentation, analysis, and problem diagnosis can enhance environments for performance-directed application development, compilation, and execution. This is especially true when paramet-ric studies, modeling, and optimization strategies require large amounts of data to be collected and processed for(More)