ScaLAPACK Tutorial

  title={ScaLAPACK Tutorial},
  author={Jack J. Dongarra and L. Susan Blackford},
ScaLAPACK is a library of high performance linear algebra routines for distributed memory MIMD computers. It is a continuation of the LAPACK project, which designed and produced analogous software for workstations, vector supercomputers, and shared memory parallel computers. The goals of the project are e ciency (to run as fast as possible), scalability (as the problem size and number of processors grow), reliability (including error bounds), portability (across all important parallel machines… 
Effortless and Efficient Distributed Data-Partitioning in Linear Algebra
A new technique to exploit compositions of different data-layout techniques with Hit map, a library for hierarchical-tiling and automatic mapping of arrays, shows that the Hit map version outperforms the ScaLAPACK implementation and is almost as efficient as the best manual MPI implementation.
THCORE: A Parallel Computation Services Model and Runtime System
In order to use the parallel scientific computation applications and libraries as the software components conveniently in the development of new applications, a parallel computation service model and the runtime system that support this model on computer clusters are presented and some design and implementation issues are discussed.
Portable profiling and tracing for parallel, scientific applications using C++
This paper focuses on the profiling and tracing of C++ applications that have been written using a rich parallel programming framework for highperformance, scientific computing and addresses issues of class-based profiling, instrumentation of templates, runtime function identification, and polymorphic (type-based) profiling.
Efficiency Comparison of Data-Parallel Programming and Message-Passing Paradigm for Molecular Dynamics Simulation
Efficiency results for molecular dynamics simulations obtained on multiprocessor supercomputers are compared using two approaches — data-parallel programming and message-passing paradigm.
Parallelized Hybrid Method With Higher-Order MoM and PO for Analysis of Phased Array Antennas on Electrically Large Platforms
An efficient parallel hybrid solver consisting of the method of moments (MoM) with higher-order basis functions (HOBs) and physical optics (PO) is proposed for the analysis of complicated phased
Parameter Estimation in Groundwater Models Using Proper Orthogonal Decomposition
A new Proper Orthogonal Decomposition reduced order model for saturated groundwater flow is developed, and that model is applied to an inverse problem for the hydraulic conductivity field.
Data assimilation for wildland fires
Two wildland fire models and methods for assimilating data in those models are presented and data assimilation methods are developed combining EnKF with Tikhonov regularization to avoid nonphysical states and with the ideas of registration and morphing from image processing to allow large position corrections.


ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance
The content and performance of ScaLAPACK, a collection of mathematical software for linear algebra computations on distributed memory computers, are outlined and alternative approaches to mathematical libraries are suggested, explaining how Sca LAPACK could be integrated into efficient and user-friendly distributed systems.
Evaluating Block Algorithm Variants in LAPACK
This paper describes some of the block factorization routines in LAPACK, a project to take advantage of the greater parallelism and improved data locality of the Level 3 BLAS to improve the ratio of computation to memory references on machines that have a memory hierarchy.
LAPACK Working Note 94: A User''s Guide to the BLACS v1.0
The BLACS (Basic Linear Algebra Communication Subprograms) project is an ongoing investigation whose purpose is to create a linear algebra oriented message passing interface that is implemented e
MPI: The Complete Reference
MPI: The Complete Reference is an annotated manual for the latest 1.1 version of the standard that illuminates the more advanced and subtle features of MPI and covers such advanced issues in parallel computing and programming as true portability, deadlock, high-performance message passing, and libraries for distributed and parallel computing.
The High Performance Fortran Handbook
High Performance Fortran is a set of extensions to Fortran expressing parallel execution at a relatively high level that brings the convenience of sequential Fortran a step closer to today's complex parallel machines.
Basic Linear Algebra Subprograms for Fortran Usage
A package of 38 low level subprograms for many of the basic operations of numerical linear algebra is presented, intended to be used with FORTRAN.
Algorithm 656: an extended set of basic linear algebra subprograms: model implementation and test programs
This paper describes a model implementation and test software for the Level 2 Basic Linear Algebra Subprograms (Level 2 BLAS). Level 2 BLAS are targeted at matrix-vector operations with the aim of
Solving linear systems on vector and shared memory computers
Vector and parallel processing overview of current high-performance computers implementation details and overhead performance - analysis, modeling and measurements building blocks in linear algebra
MPI: A Message-Passing Interface Standard
This document contains all the technical features proposed for the interface and the goal of the Message Passing Interface, simply stated, is to develop a widely used standard for writing message-passing programs.