—The two-dimensional (2D) forward/inverse discrete Fourier transform (DFT), discrete cosine transform (DCT), discrete sine transform (DST), discrete Hartley transform (DHT), discrete Walsh-Hadamard transform (DWHT), play a fundamental role in many practical applications. Due to the separability property, all these transforms can be uniquely defined as a… (More)
—OpenCL (Open Computing Language) is a framework for general-purpose parallel programming. Programs written in OpenCL are functionally portable across multiple processors including CPUs, GPUs, and also FPGAs. Using an auto-tuning technique makes performance of OpenCL programs also portable on different processors. We have developed an auto-tuning system… (More)
Contribution ◮ Blocked algorithm for the all-pairs shortest (APSP) paths problem for a hybrid CPU-GPU system. ◮ Applicable to solve the APSP problem even when the required memory size is larger than GPU's memory capacity. ◮ Fastest among existing APSP solutions on large dense graphs. ◮ Discussion on required memory bandwidth for the blocked algorithm. The… (More)
SUMMARY The All-Pairs Shortest Paths (APSP) problem is a graph problem which can be solved by a three-nested loop program. The Cell Broadband Engine (Cell/B.E.) is a heterogeneous multi-core processor that offers the high single precision floating-point performance. In this paper, a solution of the APSP problem on the Cell/B.E. is presented. To maximize the… (More)
—Recent changes in computational sciences force reevaluation of the role of dense matrix multiplication. Among others, this resulted in a proposal to consider generalized matrix multiplication, based on the theory of algebraic semirings. The aim of this note is to outline an initial object oriented model of the generalized matrix-multiply-add operation.