Kapil K. Mathur

Learn More
Some level{2 and level{3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been implemented on the Connection Machine system CM{200 are described. For matrix{matrix multiplication, both the nonsystolic and the systolic algorithms are outlined. A systolic algorithm that computes the product matrix in{place is described in detail. All algorithms(More)
Detailed algorithms for all{to{all broadcast and reduction are given for arrays mapped by binary or binary{reeected Gray code encoding to the processing nodes of binary cube networks. Algorithms are also given for the local computation of the array indices for the communicated data, thereby reducing the demand for communications bandwidth. For the(More)
A data parallel formulation of the finite element method is described. The data structures and the algorithms for stiffness matrix generation and the solution of the equilibrium equations are presented briefly. The generation of the elemental stiffness matrices requires no communication, even though each finite element is distributed over several(More)
  • 1