#### Filter Results:

#### Publication Year

1990

2016

#### Publication Type

#### Co-author

#### Publication Venue

#### Key Phrases

Learn More

—The two-dimensional (2D) forward/inverse discrete Fourier transform (DFT), discrete cosine transform (DCT), discrete sine transform (DST), discrete Hartley transform (DHT), discrete Walsh-Hadamard transform (DWHT), play a fundamental role in many practical applications. Due to the separability property, all these transforms can be uniquely defined as a… (More)

—OpenCL (Open Computing Language) is a framework for general-purpose parallel programming. Programs written in OpenCL are functionally portable across multiple processors including CPUs, GPUs, and also FPGAs. Using an auto-tuning technique makes performance of OpenCL programs also portable on different processors. We have developed an auto-tuning system… (More)

Contribution ◮ Blocked algorithm for the all-pairs shortest (APSP) paths problem for a hybrid CPU-GPU system. ◮ Applicable to solve the APSP problem even when the required memory size is larger than GPU's memory capacity. ◮ Fastest among existing APSP solutions on large dense graphs. ◮ Discussion on required memory bandwidth for the blocked algorithm. The… (More)

SUMMARY The All-Pairs Shortest Paths (APSP) problem is a graph problem which can be solved by a three-nested loop program. The Cell Broadband Engine (Cell/B.E.) is a heterogeneous multi-core processor that offers the high single precision floating-point performance. In this paper, a solution of the APSP problem on the Cell/B.E. is presented. To maximize the… (More)

In this paper, the index space of the (n×n)-matrix multiply-add problem C = C + A·B is represented as a 3D n×n×n torus. All possible time-scheduling functions to activate the computation and data rolling inside the 3D torus index space are determined. To maximize efficiency when solving a single problem, we mapped the computations into the 2D n×n toroidal… (More)