ScaLAPACK Tutorial

  title={ScaLAPACK Tutorial},
  author={Jack J. Dongarra and Antoine Petitet},


ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance
The content and performance of ScaLAPACK, a collection of mathematical software for linear algebra computations on distributed memory computers, are outlined and alternative approaches to mathematical libraries are suggested, explaining how Sca LAPACK could be integrated into efficient and user-friendly distributed systems.
Evaluating Block Algorithm Variants in LAPACK
This paper describes some of the block factorization routines in LAPACK, a project to take advantage of the greater parallelism and improved data locality of the Level 3 BLAS to improve the ratio of computation to memory references on machines that have a memory hierarchy.
LAPACK Working Note 94: A User''s Guide to the BLACS v1.0
The BLACS (Basic Linear Algebra Communication Subprograms) project is an ongoing investigation whose purpose is to create a linear algebra oriented message passing interface that is implemented e
MPI: The Complete Reference
MPI: The Complete Reference is an annotated manual for the latest 1.1 version of the standard that illuminates the more advanced and subtle features of MPI and covers such advanced issues in parallel computing and programming as true portability, deadlock, high-performance message passing, and libraries for distributed and parallel computing.
The High Performance Fortran Handbook
High Performance Fortran is a set of extensions to Fortran expressing parallel execution at a relatively high level that brings the convenience of sequential Fortran a step closer to today's complex parallel machines.
Basic Linear Algebra Subprograms for Fortran Usage
A package of 38 low level subprograms for many of the basic operations of numerical linear algebra is presented, intended to be used with FORTRAN.
Algorithm 656: an extended set of basic linear algebra subprograms: model implementation and test programs
This paper describes a model implementation and test software for the Level 2 Basic Linear Algebra Subprograms (Level 2 BLAS). Level 2 BLAS are targeted at matrix-vector operations with the aim of
Solving linear systems on vector and shared memory computers
Vector and parallel processing overview of current high-performance computers implementation details and overhead performance - analysis, modeling and measurements building blocks in linear algebra
A Proposal for a Set of Parallel Basic Linear Algebra Subprograms
The PBLAS are targeted at distributed vector-vector, matrixvector and matrix-matrix operations with the aim of simplifying the parallelization of linear algebra codes, especially when implemented on top of the sequential BLAS.