- Naohito Nakasato
- SIGMETRICS Performance Evaluation Review
- 2011

We present benchmark results of optimized dense matrix multiplication kernels for Cypress GPU. We write general matrix multiply (GEMM) kernels for single (SP), double (DP) and double-double (DDP)â€¦ (More)

- Kazuya Matsumoto, Naohito Nakasato, Stanislav G. Sedukhin
- 2012 IEEE 6th International Symposium on Embeddedâ€¦
- 2012

This paper presents results of an implementation of code generator for fast general matrix multiply (GEMM) kernels. When a set of parameters is given, the code generator produces the correspondingâ€¦ (More)

- Kazuya Matsumoto, Naohito Nakasato, Stanislav G. Sedukhin
- 2011 IEEE International Conference on Highâ€¦
- 2011

This paper presents a blocked algorithm for the all-pairs shortest paths (APSP) problem for a hybrid CPU-GPU system. In the blocked APSP algorithm, the amount of data communication between CPU (host)â€¦ (More)

- Kazuya Matsumoto, Naohito Nakasato, Stanislav G. Sedukhin
- 2012 SC Companion: High Performance Computingâ€¦
- 2012

OpenCL (Open Computing Language) is a framework for general-purpose parallel programming. Programs written in OpenCL are functionally portable across multiple processors including CPUs, GPUs, andâ€¦ (More)

- Tsuyoshi Hamada, Naohito Nakasato
- 13th Annual IEEE Symposium on Field-Programmableâ€¦
- 2005

We have developed PGR (processors generator for reconfigurable system) package which generate (a) a suitable configuration file for the FPGAs, (b) the C source code for interfacing with an FPGA-basedâ€¦ (More)

- Naohito Nakasato, Go Ogiya, Yohei Miki, Masao Mori, Ken'ichi Nomoto
- ArXiv
- 2012

A heterogeneous CPU-GPU node is getting popular in HPC clusters. We need to rethink algorithms and optimization techniques for such system depending on the relative perfor mance of CPU vs. GPU. Inâ€¦ (More)

- Naohito Nakasato
- 2010

We present benchmark results of optimized dense matrix multiplication kernels for Cypress GPU. We write general matrix multiply (GEMM) kernels for single (SP), double (DP) and double-double (DDP)â€¦ (More)

- Naohito Nakasato
- J. Comput. Science
- 2012

The kd-tree is a fundamental tool in computer science. Among othe r applications, the application of kd-tree search (by the tree method) to the fast evaluation of particle interactions and neighborâ€¦ (More)

- Tsuyoshi Hamada, Naohito Nakasato
- International Conference on Field Programmableâ€¦
- 2005

In this paper, we describe a methodology for implementing FPGA-based accelerator (FBA) from a high-level specification language. We have constructed a software package specially tuned forâ€¦ (More)

- Naohito Nakasato, Tsuyoshi Hamada
- 13th Annual IEEE Symposium on Field-Programmableâ€¦
- 2005

The smoothed particle hydrodynamics (SPH) method is a widely used particle simulation scheme in astrophysical hydrodynamics simulations. Since a possible problem size of SPH simulations is limited byâ€¦ (More)