• Publications
  • Influence
Rodinia: A benchmark suite for heterogeneous computing
TLDR
This paper presents and characterizes Rodinia, a benchmark suite for heterogeneous computing. Expand
  • 2,162
  • 344
  • PDF
Temperature-aware microarchitecture
TLDR
This paper describes HotSpot, an accurate yet fast model based on an equivalent circuit of thermal resistances and capacitances that correspond to microarchitecture blocks and essential aspects of the thermal package that can regulate operating temperature when the package's capacity is exceeded. Expand
  • 1,252
  • 183
  • PDF
Temperature-aware microarchitecture: Modeling and implementation
TLDR
This paper describes HotSpot, an accurate yet fast and practical thermal model based on an equivalent circuit of thermal resistances and capacitances that correspond to microarchitecture blocks and essential aspects of the thermal package. Expand
  • 752
  • 86
  • PDF
Energy-efficient mechanisms for managing thread context in throughput processors
TLDR
We show that a 6-entry per-thread register file cache reduces the number of reads and writes to the main register file by 50% and 59% respectively. Expand
  • 239
  • 43
  • PDF
A performance study of general-purpose applications on graphics processors using CUDA
TLDR
This paper uses NVIDIA's C-like CUDA language and an engineering sample of their recently introduced GTX 260 GPU to explore the effectiveness of GPUs for a variety of application types, and describes some specific coding idioms that improve their performance on the GPU. Expand
  • 655
  • 29
  • PDF
Temperature-Aware Microarchitecture: Extended Discussion and Results
With power density and hence cooling costs rising exponentially, processor packaging can no longer be designed for the worst case, and there is an urgent need for runtime processor-level techniquesExpand
  • 81
  • 20
  • PDF
Impact of Process Variations on Multicore Performance Symmetry
TLDR
Multi-core architectures introduce a new granularity at which process variations may occur, yielding asymmetry among cores that were designed---and that software expects---to be symmetric in performance. Expand
  • 128
  • 13
  • PDF
Dynamic warp subdivision for integrated branch and memory divergence tolerance
TLDR
Dynamic warp subdivision allows a single warp to occupy more than one slot in the scheduler without requiring extra register file space. Expand
  • 232
  • 12
  • PDF
1 CACTI 4 . 0
The original CACTI tool was released in 1994 to give computer architects a fast tool to model SRAM caches. It has been widely adopted and used since. Two new versions were released to add area andExpand
  • 62
  • 10
  • PDF
Efficient parallel merge sort for fixed and variable length keys
TLDR
We design a high-performance parallel merge sort that is well-suited for highly parallel systems. Expand
  • 60
  • 7