Hai Xiang Lin

Learn More
An integrated approach for the parallel solution of large sparse systems arisen in nite element computations is presented. The approach includes a three-phase preprocessor and a macro dataaow execution scheme. The three phases of the preprocessor are: (1) Extracting parallelism by means of an automatic domain decomposer; (2) Building the distributed data(More)
In this article, we discuss the performance modeling and optimization of Sparse Matrix-Vector Multiplication ( ) on NVIDIA GPUs using CUDA. has a very low computation-data ratio and its performance is mainly bound by the memory bandwidth. We propose optimization of based on ELLPACK from two aspects: (1) enhanced performance for the dense vector by reducing(More)
SUMMARY For the solutions of linear systems of equations with unsymmetric coeecient matrices, we propose an improved version of the quasi-minimal residual (IQMR) method by using the Lanczos process as a major component combining elements of numerical stability and parallel algorithm design. For Lanc-zos process, stability is obtained by a coupled two-term(More)
  • 1