#### Filter Results:

#### Publication Year

1989

1995

#### Publication Type

#### Co-author

#### Key Phrase

#### Publication Venue

Learn More

Some level{2 and level{3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been implemented on the Connection Machine system CM{200 are described. For matrix{matrix multiplication, both the nonsystolic and the systolic algorithms are outlined. A systolic algorithm that computes the product matrix in{place is described in detail. All algorithms… (More)

A data parallel implementation of the multiplication of matrices of arbitrary shapes and sizes is presented. A systolic algorithm based on a rectangular processor layout is used by the implementation. All processors contain submatrices of the same size for a given operand. Matrix-vector multiplication is used as a primitive for local matrix-matrix… (More)

Detailed algorithms for all{to{all broadcast and reduction are given for arrays mapped by binary or binary{reeected Gray code encoding to the processing nodes of binary cube networks. Algorithms are also given for the local computation of the array indices for the communicated data, thereby reducing the demand for communications bandwidth. For the… (More)

A data parallel formulation of the finite element method is described. The data structures and the algorithms for stiffness matrix generation and the solution of the equilibrium equations are presented briefly. The generation of the elemental stiffness matrices requires no communication, even though each finite element is distributed over several… (More)

- ‹
- 1
- ›