Cyril Randriamaro

Learn More
This article is devoted to the run-time redistribution of one-dimensional arrays that are distributed in a block-cyclic fashion over a processor grid. While previous studies have concentrated on eeciently generating the communication messages to be exchanged by the processors involved in the redistribution, we focus on the scheduling of those messages: how(More)
Minimizing communication overhead when mapping aane loop nests onto distributed memory parallel computers (DMPCs) is a key problem with regard to performance , and many authors have dealt with it. All communications are not equivalent. Local communications (translations), simple communications (horizontal or vertical ones), or structured communications(More)
Linear algebra on distributed-memory parallel computers raises the problem of data distribution of matrices and vectors among the processes. Block-cyclic distribution works well for most algorithms. The block size must be chosen carefully, however, in order to achieve good efficiency and good load balancing. This choice depends heavily on each operation;(More)
This article is devoted to the run-time redistribution of one-dimensional arrays that are distributed in a block-cyclic fashion over a processor grid. In a previous paper 2], we have reported how to derive optimal schedules made up of successive communication-steps. In this paper we assume that successive steps may overlap. We show how to obtain an optimal(More)