#### Filter Results:

#### Publication Year

1994

2014

#### Publication Type

#### Co-author

#### Publication Venue

#### Key Phrases

Learn More

This paper describes the ATLAS (Automatically Tuned Linear Algebra Software) project, as well as the fundamental principles that underly it. ATLAS is an instantiation of a new paradigm in high performance library production and maintenance, which we term AEOS (Automated Empirical Optimization of Software); this style of library management has been created… (More)

The BLACS (Basic Linear Algebra Communication Subprograms) project is an ongoing investigation whose purpose is to create a linear algebra oriented message passing interface that is implemented eeciently and uniformly across a large range of distributed memory platforms. The length of time required to implement eecient distributed memory algorithms makes it… (More)

- R. Clint Whaley
- IMCSIT
- 2008

—LAPACK (Linear Algebra PACKage) is a statically cache-blocked library, where the blocking factor (NB) is determined by the service routine ILAENV. Users are encouraged to tune NB to maximize performance on their platform/BLAS (the BLAS are LAPACK's computational engine), but in practice very few users do so (both because it is hard, and because its… (More)

- R. Clint Whaley
- 1994

The BLACS Basic Linear Algebra Communication Subprograms project is an ongoing investigation whose purpose is to create a linear algebra oriented message passing interface that is implemented eeciently and uniformly across a large range of distributed memory platforms. The length of time required to implement eecient distributed memory algorithms makes it… (More)

- Anthony M. Castaldo, R. Clint Whaley, Anthony T. Chronopoulos
- SIAM J. Scientific Computing
- 2008

This paper discusses both the theoretical and statistical errors obtained by various well-known dot products, from the canonical to pairwise algorithms, and introduces a new and more general framework that we have named superblock which subsumes them and permits a practitioner to make trade-offs between computational performance, memory usage, and error… (More)

- R. Clint Whaley
- Encyclopedia of Parallel Computing
- 2011

The BLACS (Basic Linear Algebra Communication Subprograms) project is an ongoing investigation whose purpose is to create a linear algebra oriented message passing interface that is implemented eeciently and uniformly across a large range of distributed memory platforms. The length of time required to implement eecient distributed memory algorithms makes it… (More)

- R Clint Whaley
- 1997

- Anthony M. Castaldo, R. Clint Whaley
- PPOPP
- 2010

In LAPACK many matrix operations are cast as block algorithms which iteratively process a panel using an unblocked algorithm and then update a remainder matrix using the high performance Level 3 BLAS. The Level 3 BLAS have excellent scaling, but panel processing tends to be bus bound, and thus scales with bus speed rather than the number of processors… (More)