Learn More
This brief note points out something obvious m something the authors "knew" without really understanding. With apologies to those who did understand, we offer it to those others who, like us, missed the point. We all know that the rate of improvement in microprocessor speed exceeds the rate of improvement in DRAM memory speed, each is improving(More)
Estimating power consumption is critical for hardware and software developers, and of the latter, particularly for OS programmers writing process schedulers. However, obtaining processor and system power consumption information can be non-trivial. Simulators are time consuming and prone to error. Power meters report whole-system consumption, but cannot give(More)
Architects use cycle-by-cycle simulation to evaluate design choices and understand tradeoffs and interactions among design parameters. Efficiently exploring exponential-size design spaces with many interacting parameters remains an open problem: the sheer number of experiments renders detailed simulation intractable. We attack this problem via an automated(More)
Although unstructured mesh algorithms are a popular means of solving problems across a broad range of disciplines---from texture mapping to computational fluid dynamics---they are often dominated not by computation, but by mesh overhead. Our study of an object-oriented mesh-based benchmark reveals that 72% of its execution time is spent on mesh-related(More)
Impulse is a memory system architecture that adds an optional level of address indirection at the memory controller. Applications can use this level of indirection to remap their data structures in memory. As a result, they can control how their data is accessed and cached, which can improve cache and bus utilization. The Impulse design does not require any(More)
PARSEC is a reference application suite used in industry and academia to assess new Chip Multiprocessor (CMP) designs. No investigation to date has profiled PARSEC on real hardware to better understand scaling properties and bottlenecks. This understanding is crucial in guiding future CMP designs for these kinds of emerging workloads. We use hardware(More)
We are attacking the memory bottleneck by building a " smart " memory controller that improves effective memory bandwidth, bus utilization, and cache efficiency by letting applications dictate how their data is accessed and cached. This paper describes a Parallel Vector Access unit (PVA), the vector memory subsystem that efficiently " gathers " sparse,(More)