#### Filter Results:

#### Publication Year

1989

2007

#### Publication Type

#### Co-author

#### Publication Venue

#### Key Phrases

Learn More

- Zden Ek, Johan Thomas, +7 authors S L Johnsson Preprint
- 1992

A nite element method for computational uid dynamics has been implemented on the Connection Machine systems CM-2 and CM-200. An implicit iterative solution strategy , based on the preconditioned matrix-free GMRES algorithm, is employed. Parallel data structures built on both nodal and elemental sets are used to achieve maximum paral-lelization.… (More)

- Kapil K. Mathur, S. Lennart Johnsson
- Parallel Computing
- 1994

Some level{2 and level{3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been implemented on the Connection Machine system CM{200 are described. For matrix{matrix multiplication, both the nonsystolic and the systolic algorithms are outlined. A systolic algorithm that computes the product matrix in{place is described in detail. All algorithms… (More)

A data parallel implementation of the multiplication of matrices of arbitrary shapes and sizes is presented. A systolic algorithm based on a rectangular processor layout is used by the implementation. All processors contain submatrices of the same size for a given operand. Matrix-vector multiplication is used as a primitive for local matrix-matrix… (More)

- Kapil K. Mathur, S. Lennart Johnsson
- International Journal of High Speed Computing
- 1989

EEcient data motion is critical for high performance computing on distributed memory architectures. The value of some techniques for eecient data motion is illustrated by identifying generic communication primitives. Further, the eeciency of these primitives is demonstrated on three diier-ent applications using the nite element method for unstructured grids… (More)

- Kapil K. Mathur, S. Lennart Johnsson
- Scientific Programming
- 1995

Detailed algorithms for all{to{all broadcast and reduction are given for arrays mapped by binary or binary{reeected Gray code encoding to the processing nodes of binary cube networks. Algorithms are also given for the local computation of the array indices for the communicated data, thereby reducing the demand for communications bandwidth. For the… (More)

- Kapil K. Mathur
- 1994

This paper demonstrates that scalability and competitive eeciency can be achieved for unstructured grid nite element applications on distributed memory machines, such as the Connection Machine CM-5 system. The eeciency of nite element solvers is analyzed through two applications: an implicit computational aerodynamics application and an explicit solid… (More)

- Kapil K. Mathur
- 2007