#### Filter Results:

#### Publication Year

1994

2014

#### Publication Type

#### Co-author

#### Publication Venue

#### Key Phrases

Learn More

This paper describes the ATLAS (Automatically Tuned Linear Algebra Software) project, as well as the fundamental principles that underly it. ATLAS is an instantiation of a new paradigm in high performance library production and maintenance, which we term AEOS (Automated Empirical Optimization of Software); this style of library management has been created… (More)

The BLACS (Basic Linear Algebra Communication Subprograms) project is an ongoing investigation whose purpose is to create a linear algebra oriented message passing interface that is implemented eeciently and uniformly across a large range of distributed memory platforms. The length of time required to implement eecient distributed memory algorithms makes it… (More)

—LAPACK (Linear Algebra PACKage) is a statically cache-blocked library, where the blocking factor (NB) is determined by the service routine ILAENV. Users are encouraged to tune NB to maximize performance on their platform/BLAS (the BLAS are LAPACK's computational engine), but in practice very few users do so (both because it is hard, and because its… (More)

This paper discusses both the theoretical and statistical errors obtained by various well-known dot products, from the canonical to pairwise algorithms, and introduces a new and more general framework that we have named superblock which subsumes them and permits a practitioner to make trade-offs between computational performance, memory usage, and error… (More)

The BLACS (Basic Linear Algebra Communication Subprograms) project is an ongoing investigation whose purpose is to create a linear algebra oriented message passing interface that is implemented eeciently and uniformly across a large range of distributed memory platforms. The length of time required to implement eecient distributed memory algorithms makes it… (More)

- R Clint Whaley
- 1997

In LAPACK many matrix operations are cast as block algorithms which iteratively process a panel using an unblocked algorithm and then update a remainder matrix using the high performance Level 3 BLAS. The Level 3 BLAS have excellent scaling, but panel processing tends to be bus bound, and thus scales with bus speed rather than the number of processors… (More)

There are a few application areas which remain almost untouched by the historical and continuing advancement of compilation research. For the extremes of optimization required for high performance computing on one end, and embedded systems at the opposite end of the spectrum, many critical routines are still hand-tuned, often directly in assembly. At the… (More)

- R. Clint Whaley
- 1994

The BLACS Basic Linear Algebra Communication Subprograms project is an ongoing investigation whose purpose is to create a linear algebra oriented message passing interface that is implemented eeciently and uniformly across a large range of distributed memory platforms. The length of time required to implement eecient distributed memory algorithms makes it… (More)