Amith R. Mamidala

Learn More
The Blue Gene/Q machine is the next generation in the line of IBM massively parallel supercomputers, designed to scale to 262144 nodes and sixteen million threads. With each BG/Q node having 68 hardware threads, hybrid programming paradigms, which use message passing among nodes and multi-threading within nodes, are ideal and will enable applications to(More)
The IBM Blue Gene/P (BG/P) system is a massively parallel supercomputer succeeding BG/L, and it is based on orders of magnitude in system size and significant power consumption efficiency. BG/P comes with many enhancements to the machine design and new architectural features at the hardware and software levels. In this work, we demonstrate techniques to(More)
The advances in multicore technology and modern interconnects is rapidly accelerating the number of cores deployed in today's commodity clusters. A majority of parallel applications written in MPI employ collective operations in their communication kernels. Optimization of these operations on the multicore platforms is the key to obtaining good performance(More)
Summary form only given. Modern high performance applications require efficient and scalable collective communication operations. Currently, most collective operations are implemented based on point-to-point operations. We propose to use hardware multicast in InfiniBand to design fast and scalable broadcast operations in MPl. InfiniBand supports multicast(More)
Current algorithms for doing Barrier and Allreduce like pair-wise exchange, dissemination and gather-broadcast do not give an optimal performance when there is skew in the system. In pair-wise exchange and dissemination, all the nodes must arrive for the completion of each step. The gather-broadcast algorithm assumes a fixed tree topology. In this paper, we(More)
The rapid growth of InfiniBand, 10 Gigabit Ethernet/iWARP and IB WAN extensions is increasingly gaining momentum for designing high end computing clusters and data-centers. For typical applications such as data staging, content replication and remote site backup, FTP has been the most popular method to transfer data within and across these clusters.(More)
Large scale InfiniBand clusters are becoming increasingly popular, as reflected by the TOP 500 supercomputer rankings. At the same time, fat tree has become a popular interconnection topology for these clusters, since it allows multiple paths to be available in between a pair of nodes. However, even with fat tree, hot-spots may occur in the network(More)
MPI_Alltoall is one of the most communication intense collective operation used in many parallel applications. Recently, the supercomputing arena has witnessed phenomenal growth of commodity clusters built using InfiniBand and multi-core systems. In this context, it is important to optimize this operation for these emerging clusters to allow for good(More)