Kaixi Hou

Learn More
Due to the difficulty that modern compilers have in vectorizing applications on vector-extension architectures, programmers resort to manually programming vector registers with intrinsics in order to achieve better performance. However, the continued growth in the width of registers and the evolving library of intrinsics make such manual optimizations(More)
Moore's Law effectively doubles the compute power of a microprocessor every 24 months. Over the past decade, however, this doubling in performance has been due to the doubling of the number of cores in a microprocessor rather than clock speed increases. Perhaps nowhere is this more evident than with the Intel Xeon Phi coprocessor. This many core(More)
Pairwise sequence alignment algorithms, e.g., Smith-Waterman and Needleman-Wunsch, with adjustable gap penalty systems are widely used in bioinformatics. The strong data dependencies in these algorithms, however, prevents compilers from effectively auto-vectorizing them. When programmers manually vectorize them on multi-and many-core processors, two(More)
During the study of biological viruses, a large number of fluorescent particle images are photographed by ultra-microscopes in order to observe the motion and variation of viruses. However, due to diffraction effects, the luminance values of these fluorescent images are distributed in the form of the Point Spread Function. Considering the popularity and(More)
Many applications in computational sciences and social sciences exploit sparsity and connectivity of acquired data. Even though many parallel sparse primitives such as sparse matrix-vector (SpMV) multiplication have been extensively studied, some other important building blocks, e.g., parallel transposition for sparse matrices and graphs, have not received(More)
Small insertions and deletions (indels) of bases in the DNA of an organism can map to functionally important sites in human genes, for example, and in turn, influence human traits and diseases. Dindel detects such indels, particularly small indels (> 50 nucleotides), from short-read data by using a Bayesian approach. Due to its high sensitivity to detect(More)
GPUs excel at solving many parallel problems and hence dramatically increase the computation performance. In electrodynamics and many other fields, FDTD method is widely used due to its simplicity, accuracy, and practicability. In this paper, we applied the FDTD method on the Fermi Architecture GPUs, the latest product of NVidia, for a better understanding(More)
  • 1