• Publications
  • Influence
CompostBin: A DNA Composition-Based Algorithm for Binning Environmental Shotgun Reads
We report the development of CompostBin, a DNA composition-based algorithm for analyzing metagenomic sequence reads and distributing them into taxon-specific bins. Expand
Accelerating Numerical Dense Linear Algebra Calculations with GPUs
This chapter presents the current best design and implementation practices for the acceleration of dense linear algebra on GPUs. Expand
Heterogeneous Streaming
This paper introduces a new heterogeneous streaming library called hetero Streams (hStreams). Expand
Segmenting Point Sets
We introduce a technique for segmenting a point-sampled surface into distinct features without explicit construction of a mesh or other surface representation. Expand
Improving the Performance of CA-GMRES on Multicores with Multiple GPUs
We present the experimental results on two eight-core Intel Sandy Bridge CPUs with three NDIVIA Fermi GPUs and demonstrate that significant speedups can be obtained by avoiding communication, either on a GPU or between the GPUs. Expand
Numerical Methods for Quantum Monte Carlo Simulations of the Hubbard Model
One of the core problems in materials science is how the interactions between electrons in a solid give rise to properties like ∗This work was partially supported by the National Science FoundationExpand
Optimizing Krylov Subspace Solvers on Graphics Processing Units
We target the acceleration of the BiCGSTAB solver for GPUs, showing that significant improvement can be achieved by reformulating the method and developing application-specific kernels instead of using the generic CUBLAS library provided by NVIDIA. Expand
Performance of asynchronous optimized Schwarz with one-sided communication
We test the asynchronous optimized Schwarz domain-decomposition iterative method using various one-sided (remote direct memory access) communication schemes with passive target completion and show that the asynchronous version of optimized Schwarz can outperform the synchronous version even for perfectly balanced partitionings of the problem. Expand
Mixed-Precision Cholesky QR Factorization and Its Case Studies on Multicore CPU with Multiple GPUs
We analyze the numerical properties of this mixed-precision Cholesky QR (CholQR) and show that it is numerically unstable when the matrix is ill-conditioned. Expand
On Techniques to Improve Robustness and Scalability of a Parallel Hybrid Linear Solver
A hybrid linear solver based on the Schur complement method has great potential to be a general purpose solver scalable on tens of thousands of processors. Expand