Learn More
Summary form only given. Modern high performance applications require efficient and scalable collective communication operations. Currently, most collective operations are implemented based on point-to-point operations. We propose to use hardware multicast in InfiniBand to design fast and scalable broadcast operations in MPl. InfiniBand supports multicast(More)
This paper explores the performance and optimization of the IBM Blue Gene/Q (BG/Q) five dimensional torus network on up to 16K nodes. The BG/Q hardware supports multiple dynamic routing algorithms and different traffic patterns may require different algorithms to achieve best performance. Between 85% to 95% of peak network performance is achieved for(More)
The rapid growth of InfiniBand, 10 Gigabit Ethernet/iWARP and IB WAN extensions is increasingly gaining momentum for designing high end computing clusters and data-centers. For typical applications such as data staging, content replication and remote site backup, FTP has been the most popular method to transfer data within and across these clusters.(More)
MPI_Alltoall is one of the most communication intense collective operation used in many parallel applications. Recently, the supercomputing arena has witnessed phenomenal growth of commodity clusters built using InfiniBand and multi-core systems. In this context, it is important to optimize this operation for these emerging clusters to allow for good(More)
Modern interconnects and corresponding high performance MPIs have been feeding the surge in the popularity of compute clusters and computing applications. Recently with the introduction of the iWARP (Internet wide area RDMA protocol) standard, RDMA and zero-copy data transfer capabilities have been introduced and standardized for Ethernet networks. While(More)
SUMMARY InfiniBand has become a very popular interconnect, due to its advanced features and open standard. Large scale InfiniBand clusters are becoming very popular, as reflected by the TOP 500 supercomputer rankings. However, even with popular topologies like constant bi-section bandwidth Fat Tree, hot-spots may occur with InfiniBand, due to inappropriate(More)
The advances in multicore technology and modern interconnects is rapidly accelerating the number of cores deployed in today's commodity clusters. A majority of parallel applications written in MPI employ collective operations in their communication kernels. Optimization of these operations on the multicore platforms is the key to obtaining good performance(More)
The Blue Gene/Q machine is the next generation in the line of IBM massively parallel supercomputers, designed to scale to 262144 nodes and sixteen million threads. With each BG/Q node having 68 hardware threads, hybrid programming paradigms, which use message passing among nodes and multi-threading within nodes, are ideal and will enable applications to(More)
Large scale InfiniBand clusters are becoming increasingly popular, as reflected by the TOP 500 supercomputer rankings. At the same time, fat tree has become a popular interconnection topology for these clusters, since it allows multiple paths to be available in between a pair of nodes. However, even with fat tree, hot-spots may occur in the network(More)
The All-to-all broadcast collective operation is commonly used in parallel scientific applications. This collective operation is called MPI Allgather in the context of MPI. Contemporary MPI implementations use the Recursive Doubling and Ring algorithms for implementing this collective on top of MPI point-to-point calls. This leads to several performance(More)