Kentaro Sano

Learn More
This paper presents an FPGA-based flow solver based on the systolic architecture. We show that the fractional-step method employing central difference schemes can be expressed as a systolic algorithm, and therefore the systolic architecture is suitable for a dedicated processor to the flow solver. We have designed a 2D systolic array of cells, each of which(More)
Stencil computation is one of the important kernels in scientific computations. However, sustained performance is limited owing to restriction on memory bandwidth, especially on multicore microprocessors and graphics processing units (GPUs) because of their small operational intensity. In this paper, we present a custom computing machine (CCM), called a(More)
This paper presents an FPGA-based streaming computation for the lattice Boltzmann method (LBM) to simulate fluidflow withfloating-point calculations. LBM is suitable for streaming computation because of its parallelism and regularity. We optimize the equations of LBM, and then formulate a streaming computation. To design an efficient data-pathfor throughput(More)
Stencil computation is one of the important kernels in scientific computations, however, the sustained performance is limited by memory bandwidth especially on multi-core microprocessors and GPGPUs due to its small operationalintensity. In this paper, we propose a scalable streaming-array (SSA) of simple soft-processors for high-performance stencil(More)
The 3DCGiRAM is a 3D-graphic accelerator designed for photo-realistic image synthesis. The 3DCGiRAM is equipped with graphics processing units that perform ray-object intersection calculations and intensity calculations for rays traced over a 3D virtual object space. It also has a hardwareaccelerated 3D line generator, which effectively finds objects in a(More)
This paper presents a performance model of an LBM accelerator to be implemented on a tightly-coupled FPGA cluster. In strong scaling, each accelerator node has a smaller computation as the nodes increase, and consequently communication overhead becomes apparent and limits the scalability. Our tightly-coupled FPGA cluster has the 1D ring of the(More)
This paper presents segment-parallel prediction for high-throughput compression and decompression of floating-point data streams on an FPGA-based LBM accelerator. In order to enhance the actual memory I/O bandwidth of the accelerator, we focus on the prediction-based compression of floating-point data streams. Although hardware implementation is essential(More)