#### Filter Results:

#### Publication Year

1992

2015

#### Publication Type

#### Co-author

#### Publication Venue

#### Key Phrases

Learn More

- Sandeep K. S. Gupta, S. D. Kaushik, S. Mufti, Sanjay Sharma, Chua-Huang Huang, P. Sadayappan
- 1993 International Conference on Parallel…
- 1993

Efficient generation of communication sets and local index sets is important for evaluation of array expressions in scientific languages such as Fortran-90 and High Performance Fortran implemented on distributed-memory machines. We show that for arrays affinely aligned with templates that are distributed on multiple processors with a block-cyclic… (More)

- S. D. Kaushik, Chua-Huang Huang, J. Ramanujam, P. Sadayappan
- IPPS
- 1995

s t lcm lcm*2 lcm*4 gcd gcd/2 gcd/4 s t lcm lcm*2 lcm*4 gcd gcd/2 gcd/4 Table 1: Execution times (ms) for cyclic(s) to cyclic(t) redistribution on 32 processors. other block sizes t. Fig. 3 shows the total times in milliseconds for a cyclic(192) to cyclic(8) redistribution on 32 processors for increasing data sizes. This redistribution corresponds to the… (More)

- Sandeep K. S. Gupta, S. D. Kaushik, Chua-Huang Huang, P. Sadayappan
- J. Parallel Distrib. Comput.
- 1996

Array statements are often used to express data-parallelism in scientiic languages such as Fortran 90 and High Performance Fortran. In compiling array statements for a distributed-memory machine, eecient generation of communication sets and local index sets is important. We show that for arrays distributed block-cyclically on multiple processors, the local… (More)

- S. D. Kaushik, Chua-Huang Huang, Rodney W. Johnson, P. Sadayappan
- International Conference on Supercomputing
- 1994

We address the development of efficient methods for performing data redistribution of arrays on distributed-memory machines. Data redistribution is important for the distributed-memory implementation of data parallel languages such as High Performance Fortran. An algebraic representation of regular data distributions is used to develop an analytical model… (More)

We p~esent transposition a~gorithms fo?' matrices that do not fit in main memory. Transposition is interpreted m a permutation of the vector obtained by mapping a matriz to linear memoTy. A lgopithms am derived j%om factorization of this perm~tation, using a class of permutations related to the tensor prodwt. Using this formulation of transposition, we… (More)

- D. L. Dai, Sandeep K. S. Gupta, S. D. Kaushik, J. H. Lu, R. V. Singh, Chua-Huang Huang +2 others
- SC
- 1994

EXTENT is an EXpert system for TENsor product formula Translation. In this paper we present a programming environment for automatic generation of parallel/vector programs from tensor product formulas. A tensor (Kronecker) product based programming methodology is used for designing high performance programs on various architectures. In this programming… (More)

- S. D. Kaushik, Sanjay Sharma, Chua-Huang Huang
- J. Inf. Sci. Eng.
- 1993

We use an algebraic theory based on tensor products to model multistage interconnec-tion networks. This algebraic theory has been used for designing and implementing block recursive numerical algorithms on shared-memory vector multiprocessors. In this paper, we focus on the modeling of multistage interconnection networks. The tensor product representations… (More)

- S. D. Kaushik, Chua-Huang Huang, P. Sadayappan
- J. Parallel Distrib. Comput.
- 1996

We present an algebraic theory based on tensor products for modeling direct interconnection networks. This algebraic theory has been used for designing and implementing block recursive numerical algorithms on shared-memory vector multiprocessors. This theory can be used for mapping algorithms expressed in ten-sor product form onto distributed-memory… (More)

Distributed-memory implementations of several scientific applications require array redistribution. Array redistribution is used in languages such as High Performance Fortran to dynamically change the distribution of arrays across processors. Performing array redistribution incurs two overheads-an indexing overhead for determining the set of processors to… (More)