• Publications
  • Influence
Ecient Sparse Matrix-Vector Multiplication on CUDA
TLDR
Data structures and algorithms for SpMV that are eciently implemented on the CUDA platform for the ne-grained parallel architecture of the GPU and develop methods to exploit several common forms of matrix structure while oering alternatives which accommodate greater irregularity are developed. Expand
Implementing sparse matrix-vector multiplication on throughput-oriented processors
  • N. Bell, M. Garland
  • Computer Science
  • Proceedings of the Conference on High Performance…
  • 14 November 2009
TLDR
This work explores SpMV methods that are well-suited to throughput-oriented architectures like the GPU and which exploit several common sparsity classes, including structured grid and unstructured mesh matrices. Expand
Thrust: A Productivity-Oriented Library for CUDA
TLDR
This chapter demonstrates how to leverage the Thrust parallel template library to implement high performance applications with minimal programming effort. Expand
Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods
Algebraic multigrid methods for large, sparse linear systems are a necessity in many computational simulations, yet parallel algorithms for such solvers are generally decomposed into coarse-grained...
Particle-based simulation of granular materials
TLDR
This paper presents a simple and effective method for granular material simulation that generalizes this discrete model to rigid bodies by distributing particles over their surfaces and achieves two-way coupling between granular materials and rigid bodies. Expand
Optimizing Sparse Matrix—Matrix Multiplication for the GPU
TLDR
The implementation is fully general and the optimization strategy adaptively processes the SpGEMM workload row-wise to substantially improve performance by decreasing the work complexity and utilizing the memory hierarchy more effectively. Expand
A fast multigrid algorithm for mesh deformation
TLDR
This paper shows that a previous least-squares formulation for distortion minimization reduces to a Laplacian system on a general graph structure for which it derive an analytic expression, and describes an efficient multigrid algorithm for solving the relevant equations. Expand
PyDEC: Software and Algorithms for Discretization of Exterior Calculus
TLDR
The algorithms, features, and implementation of PyDEC, a Python library for computations related to the discretization of exterior calculus, are described, which map well to the facilities of numerical libraries such as NumPy and SciPy. Expand
Thrust : A Productivity-Oriented Library for CUDA 26
This chapter demonstrates how to leverage the Thrust parallel template library to implement highperformance applications with minimal programming effort. Based on the C++ Standard Template LibraryExpand
A fast multigrid algorithm for mesh deformation
TLDR
This paper shows that a previous least-squares formulation for distortion minimization reduces to a Laplacian system on a general graph structure for which it derive an analytic expression, and describes an efficient multigrid algorithm for solving the relevant equations. Expand
...
1
2
...