• Publications
  • Influence
A quantitative study of irregular programs on GPUs
TLDR
This paper defines two measures of irregularity called control-flow irregularity and memory-access irregularity, and investigates, using performance-counter measurements, how irregular GPU kernels differ from regular kernels with respect to these measures. Expand
  • 293
  • 42
  • PDF
The tao of parallelism in algorithms
TLDR
We introduce a data-centric formulation of algorithms called the operator formulation in which an algorithm is expressed in terms of its action on data structures. Expand
  • 344
  • 31
  • PDF
FPC: A High-Speed Compressor for Double-Precision Floating-Point Data
TLDR
This paper describes and evaluates FPC, a fast lossless compression algorithm for linear streams of 64-bit floating-point data. Expand
  • 170
  • 27
  • PDF
Lonestar: A suite of parallel irregular programs
TLDR
This work is supported in part by NSF grants 0833162, 0719966, 0702353, 0724966, 0739601, and 0615240, as well as grants from IBM, SUN, and Intel Corporation. Expand
  • 158
  • 17
  • PDF
An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-Body Algorithm
TLDR
Publisher Summary This chapter describes the first CUDA implementation of the Barnes Hut n-body algorithm that runs entirely on the GPU. Expand
  • 125
  • 15
  • PDF
Similar qualitative and quantitative changes of mitochondrial respiration following strength and endurance training in normoxia and hypoxia in sedentary humans.
Endurance and strength training are established as distinct exercise modalities, increasing either mitochondrial density or myofibrillar units. Recent research, however, suggests that mitochondrialExpand
  • 123
  • 13
  • PDF
Extended Large Scale Sketch-Based 3D Shape Retrieval
TLDR
Large scale sketch-based 3D shape retrieval has received more and more attentions in the community of content- based 3D object retrieval. Expand
  • 58
  • 12
  • PDF
Measuring GPU Power with the K20 Built-in Sensor
TLDR
GPU-accelerated programs are becoming increasingly common in HPC, personal computers, and even handheld devices, making it important to optimize their energy efficiency. Expand
  • 55
  • 10
  • PDF
A GPU implementation of inclusion-based points-to analysis
TLDR
We describe a high-performance GPU implementation of an important graph algorithm used in compilers such as gcc and LLVM: Andersen-style inclusion-based points-to analysis that achieves an average speedup of 7x compared to a sequential CPU implementation and outperforms a parallel implementation of the same algorithm running on 16 CPU cores. Expand
  • 87
  • 9
  • PDF
A comparison of 3D shape retrieval methods based on a large-scale benchmark supporting multimodal queries
TLDR
We perform a more comprehensive comparison of twenty-six (eighteen originally participating algorithms and eight additional state-of-the-art or new) retrieval methods by evaluating them on the common benchmark. Expand
  • 105
  • 8
  • PDF