Learn More
Modern high performance applications require efficient and scalable collective communication operations. Currently , most collective operations are implemented based on point-to-point operations. In this paper, we propose to use hardware multicast in InfiniBand to design fast and scalable broadcast operations in MPI. InfiniBand supports multicast with(More)
MPI Alltoall is one of the most communication intense collective operation used in many parallel applications. Recently, the supercomputing arena has witnessed phenomenal growth of commodity clusters built using InfiniBand and multi-core systems. In this context , it is important to optimize this operation for these emerging clusters to allow for good(More)
The advances in multicore technology and modern interconnects is rapidly accelerating the number of cores deployed in today's commodity clusters. A majority of parallel applications written in MPI employ collective operations in their communication kernels. Optimization of these operations on the multicore platforms is the key to obtaining good performance(More)
Large scale InfiniBand clusters are becoming increasingly popular, as reflected by the TOP 500 Supercomputer rankings. At the same time, fat tree has become a popular interconnection topology for these clusters, since it allows multiple paths to be available in between a pair of nodes. However, even with fat tree, hot-spots may occur in the network(More)
—The rapid growth of InfiniBand, 10 Gigabit Eth-ernet/iWARP and IB WAN extensions is increasingly gaining momentum for designing high end computing clusters and data-centers. For typical applications such as data staging, content replication and remote site backup etc., FTP has been the most popular method to transfer bulk data within and across these(More)
The All-to-all broadcast collective operation is commonly used in parallel scientific applications. This collective operation is called MPI Allgather in the context of MPI. Contemporary MPI implementations use the Recursive Doubling and Ring algorithms for implementing this collective on top of MPI point-to-point calls. This leads to several performance(More)
The IBM Blue Gene/P (BG/P) system is a massively parallel supercomputer succeeding BG/L, and it is based on orders of magnitude in system size and significant power consumption efficiency. BG/P comes with many enhancements to the machine design and new architectural features at the hardware and software levels. In this work, we demonstrate techniques to(More)