Learn More
As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be re-formulated or new algorithms have to be developed in order to take advantage of the architectural features on these new processors. Fine grain parallelism becomes a major requirement and introduces the necessity of loose(More)
The emergence and continuing use of multi-core architectures and graphics processing units require changes in the existing software and sometimes even a redesign of the established algorithms in order to take advantage of now prevailing parallelism. Parallel Linear Algebra for Scalable Multi-core Architectures (PLASMA) and Matrix Algebra on GPU and Multics(More)
SUMMARY As multicore systems continue to gain ground in the high-performance computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these new processors. Fine-grain parallelism becomes a major requirement and introduces the necessity of loose(More)
—The Sony/Toshiba/IBM (STI) CELL processor introduces pioneering solutions in processor architecture. At the same time, it presents new challenges for the development of numerical algorithms. One is the effective exploitation of the differential between the speed of single-and double-precision arithmetic; the other is the efficient parallelization between(More)
The dataflow model is gaining popularity as a paradigm for programming multicore processors and multi-socket systems of such processors. This work proposes a programming interface and an implementation for a dataflow-based scheduler, which dispatches tasks dynamically at runtime. The scheduler relies on data dependency analysis between tasks in a sequential(More)
It is difficult to estimate the magnitude of the discontinuity that the high performance computing (HPC) community is about to experience because of the emergence of the next generation of multi-core and heterogeneous processor designs [4]. For at least two decades, HPC programmers have taken for granted that each successive generation of microprocessors(More)
Recent versions of microprocessors exhibit performance characteristics for 32 bit floating point arithmetic (single precision) that is substantially higher than 64 bit floating point arithmetic (double precision). Examples include the Intel's Pentium IV and M processors, AMD's Opteron architectures and the IBM's Cell Broad Engine processor. When working in(More)
Matrix multiplication is one of the most common numerical operations, especially in the area of dense linear algebra, where it forms the core of many important algorithms, including solvers of linear systems of equations, least square problems, and singular and eigen-value computations. The STI CELL processor exceeds the capabilities of any other processor(More)
By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. The approach presented here can apply not only to conventional processors but also to exotic technologies such as Field(More)