Amith R. Mamidala

Learn More
Introduction • MPI provides both point-to-point and collective communication • Efficient and scalable collective communication is very important to high performance applications • Modern interconnects provide certain support in hardware for collective communication – Hardware multicast in InfiniBand • Collective at hardware level usually has different(More)
—The rapid growth of InfiniBand, 10 Gigabit Eth-ernet/iWARP and IB WAN extensions is increasingly gaining momentum for designing high end computing clusters and data-centers. For typical applications such as data staging, content replication and remote site backup etc., FTP has been the most popular method to transfer bulk data within and across these(More)
MPI Alltoall is one of the most communication intense collective operation used in many parallel applications. Recently, the supercomputing arena has witnessed phenomenal growth of commodity clusters built using InfiniBand and multi-core systems. In this context , it is important to optimize this operation for these emerging clusters to allow for good(More)
Modern interconnects and corresponding high performance MPIs have been feeding the surge in the popularity of compute clusters and computing applications. Recently with the introduction of the iWARP (Internet Wide Area RDMA Protocol) standard, RDMA and zero-copy data transfer capabilities have been introduced and standardized for Ethernet networks. While(More)
SUMMARY InfiniBand has become a very popular interconnect, due to its advanced features and open standard. Large scale InfiniBand clusters are becoming very popular, as reflected by the TOP 500 supercomputer rankings. However, even with popular topologies like constant bi-section bandwidth Fat Tree, hot-spots may occur with InfiniBand, due to inappropriate(More)
The advances in multicore technology and modern interconnects is rapidly accelerating the number of cores deployed in today's commodity clusters. A majority of parallel applications written in MPI employ collective operations in their communication kernels. Optimization of these operations on the multicore platforms is the key to obtaining good performance(More)
Large scale InfiniBand clusters are becoming increasingly popular, as reflected by the TOP 500 Supercomputer rankings. At the same time, fat tree has become a popular interconnection topology for these clusters, since it allows multiple paths to be available in between a pair of nodes. However, even with fat tree, hot-spots may occur in the network(More)
The All-to-all broadcast collective operation is commonly used in parallel scientific applications. This collective operation is called MPI Allgather in the context of MPI. Contemporary MPI implementations use the Recursive Doubling and Ring algorithms for implementing this collective on top of MPI point-to-point calls. This leads to several performance(More)