Corpus ID: 236447526

An FPGA cached sparse matrix vector product (SpMV) for unstructured computational fluid dynamics simulations

  title={An FPGA cached sparse matrix vector product (SpMV) for unstructured computational fluid dynamics simulations},
  author={Guillermo Oyarzun and Daniel Peyrolon and Carlos {\'A}lvarez and Xavier Martorell},
Field Programmable Gate Arrays generate algorithmic specific architectures that improve the codes’ FLOP per watt ratio. Such devices are re-gaining interest due to the rise of new tools that facilitate their programming, such as OmpSs. The computational fluid dynamics community is always investigating new architectures that can improve its algorithms’ performance. Commonly, those algorithms have a low arithmetic intensity and only reach a small percentage of the peak performance. The sparse… Expand

Figures and Tables from this paper


A Vector Caching Scheme for Streaming FPGA SpMV Accelerators
A hardware-software caching scheme that exploits preprocessing to enable performant and area-effective SpMV acceleration and can achieve nearly stall-free execution with average 1.1 % stall time. Expand
A scalable sparse matrix-vector multiplication kernel for energy-efficient sparse-blas on FPGAs
This paper describes an FPGA-based SpMxV kernel that is scalable to efficiently utilize the available memory bandwidth and computing resources and is able to achieve higher performance than its CPU and GPU counterparts, while using only 64 single-precision processing elements. Expand
Optimising Sparse Matrix Vector multiplication for large scale FEM problems on FPGA
This work proposes an architecture and an automated customisation method to detect and optimise the architecture for block diagonal sparse matrices, enabling the solution of larger problems than previously possible and enabling the applicability of FPGAs to more interesting HPC problems. Expand
A Streaming Dataflow Engine for Sparse Matrix-Vector Multiplication Using High-Level Synthesis
The main goal of this paper is to show that FPGAs can provide comparable performance for memory-bound applications to that of the corresponding CPUs and GPUs but with significantly less energy consumption. Expand
Random access schemes for efficient FPGA SpMV acceleration
A hardware-software caching scheme named NCVCS is proposed that combines software preprocessing with a nonblocking cache to enable highly efficient SpMV accelerators with modest on-chip memory requirements and effectively combines the high efficiency from on- chip accesses with the capability of working with large matrices from off-chip accesses. Expand
Optimization of sparse matrix-vector multiplication on emerging multicore platforms
This work examines sparse matrix-vector multiply (SpMV) - one of the most heavily used kernels in scientific computing - across a broad spectrum of multicore designs, and presents several optimization strategies especially effective for the multicore environment. Expand
FPGA based acceleration of computational fluid flow simulation on unstructured mesh geometry
A Field Programmable Gate Array (FPGA) based framework is described to accelerate simulation of complex physical spatio-temporal phenomena to solve the Euler equations on an unstructured mesh using finite volume technique. Expand
Improving SpMV Performance on FPGAs through Lossless Nonzero Compression
Sparse matrix vector multiplication (SpMV) is an important kernel in many areas of scientific computing, especially as a building block for iterative linear system solvers. We study how losslessExpand
Portable implementation model for CFD simulations. Application to hybrid CPU/GPU supercomputers
A portable implementation model based on an algebraic operational approach for direct numerical simulation (DNS) and large eddy simulation of incompressible turbulent flows using unstructured hybrid meshes, based on decomposing the nonlinear operators into a concatenation of two SpMV operations. Expand
Heterogeneous CPU/GPU co-execution of CFD simulations on the POWER9 architecture: Application to airplane aerodynamics
This paper describes in detail the parallelization strategy implemented to fully exploit the different levels of parallelism, together with a novel co-execution method for the efficient utilization of heterogeneous CPU/GPU architectures. Expand