An Automatic Computation and Data Decomposition Algorithm of Prioritized Dominant Array
The speed of processor accessing local memory is much faster than that of accessing remote memory by communication on distributed memory machines. To reduce the cost brought by the communication the parallel recognition compiler must give efficient computation partition and data distribution, and guarantee that the data needed to visit during computation is kept in the local memories. While in the process of parallel recognition, we observed that in many cases there is no global consistency decomposition; and using multiple fashions of data distribution may improve the performance of parallelism. A consistency combination algorithm for dynamic decomposition that allows data reorganization was presented in this paper. The algorithm starts from the solving of decomposition in single loop nest, and then it fuses different decomposition fashions from the sets of decomposition results and using linear transformation for global data distribution consistence. Our algorithm also takes the structure and the control flow of programs into account to direct the priority order in the process of linear transformation. Effectiveness of our algorithm was shown by verification in the end of this paper.