Learn More
Computer architecture has experienced a major paradigm shift from focusing only on raw performance to considering power-performance efficiency as the defining factor of the emerging systems. Along with this shift has come increased interest in workload characterization. This interest fuels two closely related areas of research. First, various studies(More)
Managing power concerns in icroprocessors has become a pressing research problem across the domains of computer architecture, CAD, and compilers. As a result, several parameterized cycle-level power simulators have been introduced. While these simulators can be quite useful for microarchitectural studies, their generality limits how accurate they can be for(More)
This paper demonstrates a first-order, linear power estimation model ha uses performance counters to estimate run-time CPU and memory power consumption of the Intel PXA255 processor. Our model uses a set of <i>power weights</i> that map hardware performance counter values to processor and memory power consumption. Power weights are derived offline once per(More)
— The Intel Threading Building Blocks (TBB) run-time library [1] is a popular C++ parallelization environment [2][3] that offers a set of methods and templates for creating parallel applications. Through support of parallel tasks rather than parallel threads, the TBB runtime library offers improved performance scalability by dynamically redistributing(More)
Managing power concerns in microprocessors has become a pressing research problem across the domains of computer architecture, CAD, and compilers. As a result, several parameterized cycle-level power simulators have been introduced. While these simulators can be quite useful for microarchitectural studies, their generality limits how accurate they can be(More)
Chip multi-processors (CMPs) already have widespread commercial availability, and technology roadmaps project enough on-chip transistors to replicate tens or hundreds of current processor cores. How will we express parallelism, partition applications, and schedule/place/migrate threads on these highly-parallel CMPs?This paper presents and evaluates a new(More)
The design process for chip multiprocessors (CMPs) requires extremely long simulation times to explore performance, power, and thermal issues, particularly when operating system (OS) effects are included. In response, our novel FPGA-based emulation methodology models a full CMP design including applications and an OS. Activity counters programmed into the(More)
Creating efficient, scalable dynamic parallel runtime systems for chip multiprocessors (CMPs) requires understanding the overheads that manifest at high core counts and small task sizes. In this article, we assess these overheads on Intel's Threading Building Blocks (TBB) and OpenMP. First, we use real hardware and simulations to detail various scheduler(More)
Hardware Performance Counters (HPCs) have seen increasing use over the past decade or more, both for performance debugging systems at hardware design time, as well as for software development and performance tuning efforts. Most recently, there has also been significant research effort dedicated to using HPCs for large-scale workload phase behavior(More)