Learn More
This paper presents and characterizes Rodinia, a benchmark suite for heterogeneous computing. To help architects study emerging platforms such as GPUs (Graphics Processing Units), Rodinia includes applications and kernels which target multi-core CPU and GPU platforms. The choice of applications is inspired by Berkeley's dwarf taxonomy. Our characterization(More)
With power density and hence cooling costs rising exponentially, processor packaging can no longer be designed for the worst case, and there is an urgent need for runtime processor-level techniques that can regulate operating temperature when the package's capacity is exceeded. Evaluating such techniques, however, requires a thermal model that is practical(More)
With cooling costs rising exponentially, designing cooling solutions for worst-case power dissipation is prohibitively expensive. Chips that can autonomously modify their execution and power-dissipation characteristics permit the use of lower-cost cooling solutions while still guaranteeing safe temperature regulation. Evaluating techniques for this(More)
Modern graphics processing units (GPUs) use a large number of hardware threads to hide both function unit and memory access latency. Extreme multithreading requires a complicated thread scheduler as well as a large register file, which is expensive to access both in terms of energy and latency. We present two complementary techniques for reducing energy on(More)
The original CACTI tool was released in 1994 to give computer architects a fast tool to model SRAM caches. It has been widely adopted and used since. Two new versions were released to add area and active power modeling to CACTI. This new version adds a model for leakage power and updates the basic circuit structure and device parameters to better reflect(More)
With power density and hence cooling costs rising exponentially, processor packaging can no longer be designed for the worst case, and there is an urgent need for runtime processor-level techniques that can regulate operating temperature when the package’s capacity is exceeded. Evaluating such techniques, however, requires a thermal model that is practical(More)
Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of generalpurpose applications compared to contemporary general-purpose processors (CPUs). This paper uses(More)
SIMD organizations amortize the area and power of fetch, decode, and issue logic across multiple processing units in order to maximize throughput for a given area and power budget. However, throughput is reduced when a set of threads operating in lockstep (a warp) are stalled due to long latency memory accesses. The resulting idle cycles are extremely(More)
Multi-core architectures introduce a new granularity at which process variations may occur, yielding asymmetry among cores that were designed---and that software expects---to be symmetric in performance. The chief source of this phenomenon are highly correlated, "systematic" within-die variations such as optical imperfections yielding variations across the(More)
In recent years, power density in microprocessors has doubled every three years, and experts expect this rate to increase within one to two generations as feature sizes and frequencies scale faster than operating voltages. Because a microprocessor consumes energy and converts it into heat, the corresponding exponential rise in heat density is creating(More)