Learn More
Collective communication operations, used by many scientific applications, tend to limit overall parallel application performance and scalability. Computer systems are becoming more heterogeneous with increasing node and core-per-node counts. Also, a growing number of data-access mechanisms, of varying characteristics, are supported within a single computer(More)
This paper introduces the newly developed Infini- Band (IB) Management Queue capability, used by the Host Channel Adapter (HCA) to manage network task data flow dependancies, and progress the communications associated with such flows. These tasks include sends, receives, and the newly supported wait task, and are scheduled by the HCA based on a data(More)
This paper explores the computation and communication overlap capabilities enabled by the new CORE-Direct hardware capabilities introduced in the InfiniBand Network Interface Card (NIC) ConnectX-2. We use the latency dominated nonblocking barrier algorithm in this study, and find that at 64 process count, a contiguous time slot of about 80% of the(More)
High performance computing (HPC) has begun scaling beyond the Petaflop range towards the Exaflop (1000 Petaflops) mark. One of the major concerns throughout the development toward such performance capability is scalability—both at the system level and the application layer. In this paper we present a novel approach for a new design concept—the co-design(More)
This paper describes the design and implementation of InfiniBand (IB) {CORE-textit{Direct}} based blocking and nonblocking broadcast operations within the Cheetah collective operation framework. It describes a novel approach that fully offloads collective operations and employs only user-supplied buffers. For a 64 rank communicator, the latency of(More)
OpenSHMEM is an effort to standardize the well-known SHMEM parallel programming library. The project aims to produce an open-source and portable SHMEM API and is led by ORNL and UH. In this paper, we optimize the current OpenSHMEM reference implementation , based on GASNet, to achieve higher performance characteristics. To achieve these desired performance(More)
Clusters featuring the InfiniBand interconnect are continuing to scale. As an example, the “Ranger” system at the Texas Advanced Computing Center (TACC) includes over 60,000 cores with nearly 4,000 InfiniBand ports. The latest Top500 list shows 30% of systems and over 50% of the top 100 are now using InfiniBand as the compute node(More)
Many scientific simulations, using the Message Passing Interface (MPI) programming model, are sensitive to the performance and scalability of reduction collective operations such as MPI_Allreduce and MPI_Reduce. These operations are the most widely used abstractions to perform mathematical operations over all processes that are part of the simulation. In(More)
OpenSHMEM is an open standard for SHMEM libraries. With the standardisation process complete, the community is looking towards extending the API for increasing programmer flexibility and extreme scalability. According to the current OpenSHMEM specification (revision 1.1), allocation of symmetric memory is collective across all PEs executing the application.(More)