In pipelined parallel computations the inner loops are often implemented in a block fashion. In such programs, an important compiler optimization involves the need to statically determine the grain size. This paper presents extensions and experimental validation of the previous results of Andonov and Rajopadhye on optimal grain size determination.
We discuss in this paper the problem of nding the optimal tiling transformation of three-dimensional uniform recurrences on a two-dimensional torus/grid of distributed-memory general-purpose machines. We show that even for the simplest case of recurrences which allows for such transformation, the corresponding problem of minimizing the total running time is… (More)