#### Filter Results:

- Full text PDF available (20)

#### Publication Year

1978

2014

- This year (0)
- Last 5 years (2)
- Last 10 years (14)

#### Publication Type

#### Co-author

#### Journals and Conferences

#### Key Phrases

Learn More

This paper describes the ATLAS (Automatically Tuned Linear Algebra Software) project, as well as the fundamental principles that underly it. ATLAS is an instantiation of a new paradigm in high performance library production and maintenance, which we term AEOS (Automated Empirical Optimization of Software); this style of library management has been created… (More)

- Richard Carl Demmel, Jack J. Dongarra, +5 authors Katherine A. Yelick
- Proceedings of the IEEE
- 2005

One of the main obstacles to the efficient solution of scientific problems is the problem of tuning software, both to the available architecture and to the user problem at hand. We describe approaches for obtaining tuned high-performance kernels, and for automatically choosing suitable algorithms. Specifically, we describe the generation of dense and sparse… (More)

- R. Clint Whaley, Anthony M. Castaldo
- Softw., Pract. Exper.
- 2008

Key computational kernels must run near their peak efficiency for most high performance computing (HPC) applications. Getting this level of efficiency has always required extensive tuning of the kernel on a particular platform of interest. The success or failure of an optimization is usually measured by invoking a timer. Understanding how to build reliable… (More)

- Anthony M. Castaldo, R. Clint Whaley
- PPOPP
- 2010

In LAPACK many matrix operations are cast as block algorithms which iteratively process a panel using an unblocked algorithm and then update a remainder matrix using the high performance Level 3 BLAS. The Level 3 BLAS have excellent scaling, but panel processing tends to be bus bound, and thus scales with bus speed rather than the number of processors… (More)

- R. Clint Whaley
- IMCSIT
- 2008

LAPACK (Linear Algebra PACKage) is a statically cache-blocked library, where the blocking factor (NB) is determined by the service routine ILAENV. Users are encouraged to tune NB to maximize performance on their platform/BLAS (the BLAS are LAPACK’s computational engine), but in practice very few users do so (both because it is hard, and because its… (More)

- Anthony M. Castaldo, R. Clint Whaley, Anthony T. Chronopoulos
- SIAM J. Scientific Computing
- 2008

This paper discusses both the theoretical and statistical errors obtained by various well-known dot products, from the canonical to pairwise algorithms, and introduces a new and more general framework that we have named superblock which subsumes them and permits a practitioner to make trade-offs between computational performance, memory usage, and error… (More)

There are a few application areas which remain almost untouched by the historical and continuing advancement of compilation research. For the extremes of optimization required for high performance computing on one end, and embedded systems at the opposite end of the spectrum, many critical routines are still hand-tuned, often directly in assembly. At the… (More)

- Anthony M. Castaldo, R. Clint Whaley
- 2009 IEEE International Symposium on Parallel…
- 2009

Using the well-known ATLAS and LAPACK dense linear algebra libraries, we demonstrate that the parallel management overhead (PMO) can grow with problem size on even statically scheduled parallel programs with minimal task interaction. Therefore, the widely held view that these thread management issues can be ignored in such computationally intensive… (More)

- Majedul Haque Sujon, R. Clint Whaley, Qing Yi
- Proceedings of the 22nd International Conference…
- 2013

Modern architectures increasingly rely on SIMD vectorization to improve performance for floating point intensive scientific applications. However, existing compiler optimization techniques for automatic vectorization are inhibited by the presence of unknown control flow surrounding partially vectorizable computations. In this paper, we present a new… (More)

- R. Clint Whaley
- Software Automatic Tuning, From Concepts to State…
- 2010

This paper describes the widely-used ATLAS (Automatically Tuned Linear Algebra Software) project as it stands today. ATLAS is an instantiation of a paradigm in high performance library production and maintenance, which we term AEOS (Automated Empirical Optimization of Software); this style of library management has been created in order to allow software to… (More)