Jhy-Chun Wang

With the advent of new routing methods, the distance to which a message is sent is becoming relatively less and less important. Thus, assuming no link contention, permutation seems to be an eecient collective communication primitive. In this paper we present several algorithms for decomposing all-to-many personalized communication into a set of disjoint(More)
Supporting source-level performance analysis of programs written in data-parallel languages requires a unique degree of integration between compilers and performance analysis tools. Compilers for languages such as High Performance Fortran infer parallelism and communication from data distribution directives, thus, performance tools cannot meaningfully(More)
In this paper, we present several algorithms for performing all-to-many personalized communication on distributed memory parallel machines. Each processor sends a diierent message (of potentially diierent size) to a subset of all the processors involved in the collective communication. The algorithms are based on decomposing the communication matrix into a(More)
