S. D. Kaushik

Learn More
We p~esent transposition a~gorithms fo?' matrices that do not fit in main memory. Transposition is interpreted m a permutation of the vector obtained by mapping a matriz to linear memoTy. A lgopithms am derived j%om factorization of this perm~tation, using a class of permutations related to the tensor prodwt. Using this formulation of transposition, we(More)
s t lcm lcm*2 lcm*4 gcd gcd/2 gcd/4 s t lcm lcm*2 lcm*4 gcd gcd/2 gcd/4 Table 1: Execution times (ms) for cyclic(s) to cyclic(t) redistribution on 32 processors. other block sizes t. Fig. 3 shows the total times in milliseconds for a cyclic(192) to cyclic(8) redistribution on 32 processors for increasing data sizes. This redistribution corresponds to the(More)
Array statements are often used to express data-parallelism in scientiic languages such as Fortran 90 and High Performance Fortran. In compiling array statements for a distributed-memory machine, eecient generation of communication sets and local index sets is important. We show that for arrays distributed block-cyclically on multiple processors, the local(More)
EXTENT is an EXpert system for TENsor product formula Translation. In this paper we present a programming environment for automatic generation of parallel/vector programs from tensor product formulas. A tensor (Kronecker) product based programming methodology is used for designing high performance programs on various architectures. In this programming(More)
We address the development of efficient methods for performing data redistribution of arrays on distributed-memory machines. Data redistribution is important for the distributed-memory implementation of data parallel languages such as High Performance Fortran. An algebraic representation of regular data distributions is used to develop an analytical model(More)
We use an algebraic theory based on tensor products to model multistage interconnec-tion networks. This algebraic theory has been used for designing and implementing block recursive numerical algorithms on shared-memory vector multiprocessors. In this paper, we focus on the modeling of multistage interconnection networks. The tensor product representations(More)
We present an algebraic theory based on tensor products for modeling direct interconnection networks. This algebraic theory has been used for designing and implementing block recursive numerical algorithms on shared-memory vector multiprocessors. This theory can be used for mapping algorithms expressed in ten-sor product form onto distributed-memory(More)