Misbah Mubarak

Learn More
A low-latency and low-diameter interconnection network will be an important component of future exascale architectures. The dragonfly network topology, a two-level directly connected network, is a candidate for exascale architectures because of its low diameter and reduced latency. To date, small-scale simulations with a few thousand nodes have been carried(More)
A high-bandwidth, low-latency interconnect will be a critical component of future exascale systems. The torus network topology, which uses multidimensional network links to improve path diversity and exploit locality between nodes, is a potential candidate for exascale interconnects. The communication behavior of large-scale scientific applications running(More)
This paper presents a preliminary evaluation of TraceR, a trace replay tool built upon the ROSS-based CODES simulation framework. TraceR can be used for predicting network performance and understanding network behavior by simulating messaging on interconnec-tion networks. It addresses two major shortcomings in current network simulators. First, it enables(More)
MPI collective operations are a critical and frequently used part of most MPI-based large-scale scientific applications. In previous work, we have enabled the Rensselaer Optimistic Simulation System (ROSS) to predict the performance of MPI point-to-point messaging on high-fidelity million-node network simulations of torus and dragonfly interconnects. The(More)
With the increasing complexity of today's high-performance computing (HPC) architectures, simulation has become an indispensable tool for exploring the design space of HPC systems-in particular, networks. In order to make effective design decisions, simulations of these systems must possess the following properties: (1) have high accuracy and fidelity, (2)(More)
Accurate analysis of HPC storage system designs is contingent on the use of I/O workloads that are truly representative of expected use. However, I/O analyses are generally bound to specific workload modeling techniques such as synthetic benchmarks or trace replay mechanisms, despite the fact that no single workload modeling technique is appropriate for all(More)
Fault response strategies are crucial to maintaining performance and availability in HPC storage systems, and the first responsibility of a successful fault response strategy is to detect failures and maintain an accurate view of group membership. This is a nontrivial problem given the unreliable nature of communication networks and other system components.(More)
Implementing mobility in MultiAgent systems is one of the major challenges these days. True mobility consists of movement of code, data and execution state. There are various techniques through which true mobility can be achieved in MultiAgent systems. Certain mechanisms are present that are used to capture the state of a thread and re-establish it at the(More)
Network contention between concurrently running jobs on HPC systems is a primary cause of performance variability. Optimizing job allocation and avoiding network sharing are hence crucial to alleviate the potential performance degradation. In order to do so effectively, an understanding of the interference among concurrently running jobs, their(More)