This paper covers the multi-threaded parallel processing of a sparse triangular solver for a linear system with a sparse coefficient matrix, focusing on its application to a parallel ICCG solver. We propose algebraic block multi-color ordering, which is an enhanced version of block multi-color ordering for general unstructured analysis. We present blocking… (More)
We discuss a scheme for hierarchical matrices with adaptive cross approximation on symmetric multipro-cessing clusters. We propose a set of parallel algorithms that are applicable to hierarchical matrices. The proposed algorithms are implemented using the flat-MPI and hybrid MPI+OpenMP programming models. The performance of these implementations is… (More)
This paper introduces an automatic tuning method for the tiling parameters required in an implementation of the three-dimensional FDTD method based on time-space tiling. In this tuning process, an appropriate range for the tile size is first determined by trial experiments using cubic tiles. The tile shape is then optimized by using the Monte Carlo method.… (More)
Permission to make digital or hard copies of portions of this work for personal or classroom use is granted provided that the copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise requires prior specific permission by the publisher mentioned above.
In this paper, we discuss an efficient implementation of the three-dimensional multigrid Poisson solver on a many-core coprocessor, Intel Xeon Phi. We have used the modified block red-black (mBRB) Gauss-Seidel (GS) smoother to achieve sufficient degree of parallelism and high cache hit ratio. We have vectorized (SIMDized) the GS steps in the smoother by… (More)