Jhy-Chun Wang

Learn More
Supporting source-level performance analysis of programs written in data-parallel languages requires a unique degree of integration between compilers and performance analysis tools. Compilers for languages such as High Performance Fortran infer parallelism and communication from data distribution directives, thus, performance tools cannot meaningfully(More)
In this paper, we present several algorithms for performing all-to-many personalized communication on distributed memory parallel machines. Each processor sends a diierent message (of potentially diierent size) to a subset of all the processors involved in the collective communication. The algorithms are based on decomposing the communication matrix into a(More)
  • 1