• Publications
  • Influence
A Survey of General-Purpose Computation on Graphics Hardware
TLDR
The rapid increase in the performance of graphics hardware, coupled with recent improvements in its programmability, have made graphics hardware a compelling platform for general-purpose computing. Expand
  • 1,881
  • 126
  • PDF
Memory access scheduling
TLDR
We introduce memory access scheduling, a technique that improves the performance of a memory system by reordering memory references to exploit locality within the 3-D memory structure. Expand
  • 937
  • 109
  • PDF
GPU Computing
TLDR
We describe the background, hardware, and programming model for GPU computing, summarize the state of the art in tools and techniques, and present four GPU computing successes in game physics and computational biophysics. Expand
  • 1,518
  • 93
  • PDF
Gunrock: a high-performance graph processing library on the GPU
TLDR
"Gunrock," our high-level bulk-synchronous graph-processing system targeting the GPU, takes a new approach to abstracting GPU graph analytics: rather than designing an abstraction around computation, Gunrock instead implements a data-centric abstraction centered on operations on a vertex or edge frontier. Expand
  • 271
  • 68
  • PDF
Scan primitives for GPU computing
TLDR
The scan primitives are powerful, general-purpose data-parallel primitives that are building blocks for a broad range of applications on parallel hardware generally and, we believe, specifically the GPU. Expand
  • 612
  • 50
  • PDF
Research Challenges for On-Chip Interconnection Networks
TLDR
On-chip interconnection networks are rapidly becoming a key enabling technology for commodity multicore processors and SoCs common in consumer embedded systems, the National Science Foundation initiated a workshop that addressed upcoming research issues in OCIN technology, design, and implementation and set a direction for researchers in the field. Expand
  • 479
  • 31
  • PDF
Imagine: Media Processing with Streams
TLDR
The power-efficient Imagine stream processor achieves performance densities comparable to those of special-purpose embedded processors. Expand
  • 381
  • 27
The Imagine Stream Processor
TLDR
The Imagine Stream Processor is a single-chip programmable media processor with 48 parallel ALUs. Expand
  • 258
  • 21
  • PDF
Work-Efficient Parallel GPU Methods for Single-Source Shortest Paths
TLDR
We present three parallel friendly and work-efficient methods to solve this Single-Source Shortest Paths (SSSP) problem: Work front Sweep, Near-Far and Bucketing. Expand
  • 149
  • 21
  • PDF
Register organization for media processing
TLDR
We show that partitioning the register file along three axes reduces the area, delay, and power dissipation of a media processor by factors of 195, 230 and 430 respectively. Expand
  • 314
  • 20
  • PDF
...
1
2
3
4
5
...