Jhy-Chun Wang

Learn More
Supporting source-level performance analysis of programs written in data-parallel languages requires a unique degree of integration between compilers and performance analysis tools. Compilers for languages such as High Performance Fortran infer parallelism and communication from data distribution directives, thus, performance tools cannot meaningfully(More)
In this paper, we present several algorithms for per forming all-to-many personalized communication on distributed memory parallel machines. Each proces sor sends a different message (of potentially different size) to a subset of all the processors involved in the collective communication. The algorithms are based on decomposing the communication matrix(More)
With the advent of new routing methods, the distance to which a message is sent is becoming relatively less and less important. Thus, assuming no link contention, permutation seems to be an e cient collective communication primitive. In this paper we present several algorithms for decomposing all-to-many personalized communication into a set of disjoint(More)
This paper presents a simple load balancing algorithm and its probabilistic analysis. Unlike most of the previous load balancing algorithms, this algorithm maintains locality. We show that the cost of this load balancing algorithm is small for practical situations and discuss some interesting applications for data remapping. Index Terms Data locality,(More)
In this paper we present several algorithms for performing all-to-many personalized communication on distributed memory parallel machines. We assume that each processor sends a di erent message (of potentially di erent size) to a subset of all the processors involved in the collective communication. The algorithms are based on decomposing the communication(More)
To support the transition from programming languages in which parallelism and communication are explicit to high-level languages that rely on compilers to infer such details from data decomposition directives, tools for performance analysis require increased sophistication and integration with other components in the programming system. We explore(More)
A communication package, Non-uniform Irregular Communication Exchange (NICE), is designed to help users in scheduling message-passing requests on distributed-memorymachines. This package schedules a batch of messages into a set of partial permutations and provides communication primitives to carry out the communication. The NICE primitives are focused on(More)
  • 1