Learn More
We present a methodology which allows to derive accurate and simple models which are able to describe the performance of parallel applications without looking at the source code. A trace is obtained and linear models are derived by fitting the outcome of a set of simulations varying the influential parameters, such as: processor speed, network latency or(More)
A family of oblivious routing schemes for Fat Trees and their slimmed versions is presented in this work. First, two popular oblivious routing algorithms, which we refer to as S-mod-k and D-mod-k, are analyzed in detail. S-mod-k is the default routing algorithm given as an example in the first works formally describing Fat Tree networks. D-mod-k has been(More)
Dragonfly networks have been recently proposed for the interconnection network of forthcoming exascale supercomputers. Relying on large-radix routers, they build a topology with low diameter and high throughput, divided into multiple groups of routers. While minimal routing is appropriate for uniform traffic patterns, adversarial traffic patterns can(More)
The personalized all-to-all collective exchange is one of the most challenging communication patterns in HPC applications in terms of performance and scalability. In the context of the fat tree family of interconnection networks, widely used in current HPC systems and datacenters, we show that there is potential for optimizing this traffic pattern by(More)
Interference of nearby jobs has been recently identified as the dominant reason for the high performance variability of parallel applications running on High Performance Computing (HPC) systems. Typically, HPC systems are dynamic with multiple jobs coming and leaving in an unpredictable fashion, sharing simultaneously the system interconnection network. In(More)
In the context of developing next-generation high-performance computing systems, there is often a need for an " end-to-end " simulation tool that can simulate the behaviour of a full application on a reasonably faithful model of the actual system. Considering the ever-increasing levels of paral-lelism, we take a communication-centric view of the system(More)
New static source routing algorithms for High Performance Computing (HPC) are presented in this work. The target parallel architectures are based on the commonly used fat-tree networks and their slimmed versions. The evaluation of such proposals and their comparison against currently used routing mechanisms have been driven by realistic traffic generated by(More)
We describe a methodology to derive a simple characterization of a parallel program and models of its performance on a target architecture. Our approach starts from an instrumented run of the program to obtain a trace. A simple linear model of the performance of the application as a function of architectural parameters is then derived by fitting the results(More)