Kaixi Hou

Learn More
Due to the difficulty that modern compilers have in vectorizing applications on vector-extension architectures, programmers resort to manually programming vector registers with intrinsics in order to achieve better performance. However, the continued growth in the width of registers and the evolving library of intrinsics make such manual optimizations(More)
Many applications in computational sciences and social sciences exploit sparsity and connectivity of acquired data. Even though many parallel sparse primitives such as sparse matrix-vector (SpMV) multiplication have been extensively studied, some other important building blocks, e.g., parallel transposition for sparse matrices and graphs, have not received(More)
—Moore's Law effectively doubles the compute power of a microprocessor every 24 months. Over the past decade, however, this doubling in performance has been due to the doubling of the number of cores in a microprocessor rather than clock speed increases. Perhaps nowhere is this more evident than with the Intel Xeon Phi coprocessor. This manycore(More)
GPUs excel at solving many parallel problems and hence dramatically increase the computation performance. In electrodynamics and many other fields, FDTD method is widely used due to its simplicity, accuracy, and practicability. In this paper, we applied the FDTD method on the Fermi Architecture GPUs, the latest product of NVidia, for a better understanding(More)
  • 1