Learn More
We present the architecture of the Deep Computing Messaging Framework (DCMF), a message passing runtime designed for the Blue Gene/P machine and other HPC architectures. DCMF has been designed to easily support several programming paradigms such as the Message Passing Interface (MPI), Aggregate Remote Memory Copy Interface (ARMCI), Charm++, and others. This(More)
The MPI Forum has ratified extensions to MPI RMA with a new flexible and high performance passive target synchronization mechanism, new calls for window allocation and atomic operations executed on remote windows. In this paper, we explore an implementation of this new MPI-3.0 RMA interface on the Blue Gene/Q machine with performance results. We take(More)
The Blue Gene/Q machine is the next generation in the line of IBM massively parallel supercomputers, designed to scale to 262144 nodes and sixteen million threads. With each BG/Q node having 68 hardware threads, hybrid programming paradigms, which use message passing among nodes and multi-threading within nodes, are ideal and will enable applications to(More)
This paper discusses the design and implementation of a one-sided communication interface for the IBM Blue Gene/L supercomputer. This interface facilitates ARMCI and the Global Arrays toolkit and can be used by other one-sided communication libraries. New protocols, interrupt driven communication, and compute node kernel enhancements were required to enable(More)
This paper evaluates the performance of remote memory access (RMA) communication and its capabilities on the blue gene/P supercomputer. This study includes the high performance implementation and performance of global arrays (GA) and its runtime system, aggregate remote memory copy interface (ARMCI). Our implementation of GA/ARMCI on blue gene/P is on top(More)
The Blue Gene/P (BG/P) supercomputer consists of thousands of compute nodes interconnected by multiple networks. Out of these, a 3D torus equipped with direct memory access (DMA) engine is the primary network. BG/P also features a collective network which supports hardware accelerated collective operations such as broadcast and all reduce. One of the(More)