Supporting source-level performance analysis of programs written in data-parallel languages requires a unique degree of integration between compilers and performance analysis tools. Compilers for languages such as High Performance Fortran infer parallelism and communication from data distribution directives, thus, performance tools cannot meaningfully… (More)
We develop algorithms for mapping <italic>n</italic>-dimensional meshes on a star graph of degree <italic>n</italic> with expansion 1 and dilation 3. We show that an <italic>n</italic>-degree star graph can efficiently simulate an <italic>n</italic>-dimensional mesh.
This paper presents a simple load balancing algorithm and its probabilistic analysis. Unlike most of the previous load balancing algorithms, this algorithm maintains locality. We show that the cost of this load balancing algorithm is small for practical situations and discuss some interesting applications for data remapping.
In this paper we present several algorithms for decomposing all-to-many personalized communication into a set of disjoint partial permutations. These partial permutations avoid node contention as well as link contention. We discuss the theoretical complexity of these algorithms and study their effectiveness both from the view of static scheduling and from… (More)
In this paper, we present several algorithms for performing all-to-many personalized communication on distributed memory parallel machines. Each processor sends a diierent message (of potentially diierent size) to a subset of all the processors involved in the collective communication. The algorithms are based on decomposing the communication matrix into a… (More)