#### Filter Results:

- Full text PDF available (19)

#### Publication Year

1990

2016

- This year (0)
- Last 5 years (11)
- Last 10 years (30)

#### Publication Type

#### Co-author

#### Journals and Conferences

#### Key Phrases

Learn More

- K. Matsumoto, N. Nakasato, S. G. Sedukhin
- 2012 IEEE 6th International Symposium on Embedded…
- 2012

This paper presents results of an implementation of code generator for fast general matrix multiply (GEMM) kernels. When a set of parameters is given, the code generator produces the corresponding GEMM kernel written in OpenCL. The produced kernels are optimized for high-performance implementation on GPUs from AMD. Access latencies to GPU global memory is… (More)

- Kazuya Matsumoto, Naohito Nakasato, Stanislav G. Sedukhin
- 2011 IEEE International Conference on High…
- 2011

This paper presents a blocked algorithm for the all-pairs shortest paths (APSP) problem for a hybrid CPU-GPU system. In the blocked APSP algorithm, the amount of data communication between CPU (host) memory and GPU memory is minimized. When a problem size (the number of vertices in a graph) is large enough compared with a blocking factor, the blocked… (More)

- Kazuya Matsumoto, Naohito Nakasato, Stanislav G. Sedukhin
- 2012 SC Companion: High Performance Computing…
- 2012

OpenCL (Open Computing Language) is a framework for general-purpose parallel programming. Programs written in OpenCL are functionally portable across multiple processors including CPUs, GPUs, and also FPGAs. Using an auto-tuning technique makes performance of OpenCL programs also portable on different processors. We have developed an auto-tuning system with… (More)

- Abhijeet A. Ravankar, Stanislav G. Sedukhin
- 2010 First International Conference on Networking…
- 2010

In this paper we propose a novel “Mesh-of-Tori” cellular interconnection network for scalable and massively parallel array processors with frontal plane I/O. The unit (called “m-Cell”) in this topology is the smallest double (for 2D case) or triple (for 3D) folded torus, which forms the basic ‘tile’. The Cells… (More)

- Ahmed S. Zekri, Stanislav G. Sedukhin
- Proceedings 20th IEEE International Parallel…
- 2006

In this paper, the index space of the (n times n)-matrix multiply-add problem C = C + AmiddotB is represented as a 3D n times n times n torus. All possible time-scheduling functions to activate the computation and data rolling inside the 3D torus index space are determined. To maximize efficiency when solving a single problem, we mapped the computations… (More)

- Stanislav G. Sedukhin, Igor S. Sedukhin
- CONPAR
- 1994

- Yuki Ikegaki, Toshiaki Miyazaki, Stanislav G. Sedukhin
- IEICE Transactions
- 2011

- Stanislav G. Sedukhin, Ahmed S. Zekri, Toshiaki Miyazaki
- 2010 39th International Conference on Parallel…
- 2010

The two-dimensional (2D) forward/inverse discrete Fourier transform (DFT), discrete cosine transform (DCT), discrete sine transform (DST), discrete Hartley transform (DHT), discrete Walsh-Hadamard transform (DWHT), play a fundamental role in many practical applications. Due to the separability property, all these transforms can be uniquely defined as a… (More)

- Stanislav G. Sedukhin, Toshiaki Miyazaki
- CATA
- 2010

- Kazuya Matsumoto, Stanislav G. Sedukhin
- IEICE Transactions
- 2009

The All-Pairs Shortest Paths (APSP) problem is a graph problem which can be solved by a three-nested loop program. The Cell Broadband Engine (Cell/B.E.) is a heterogeneous multi-core processor that offers the high single precision floating-point performance. In this paper, a solution of the APSP problem on the Cell/B.E. is presented. To maximize the… (More)