Datacenter workloads demand high computational capabilities, flexibility, power efficiency, and low cost. It is challenging to improve all of these factors simultaneously. To advance datacenter capabilities beyond what commodity server designs can provide, we designed and built a composable, reconfigurable hardware fabric based on field programmable gate… (More)
This paper describes FAST, a novel simulation methodology that can produce simulators that (i) are orders of magnitude faster than comparable simulators, (ii) are cycle-accurate, (iii) model the entire system running unmodified applications and operating systems, (iv) provide visibility with minimal simulation performance impact and (v) are capable of… (More)
This paper describes the FAST methodology that enables a single FPGA to accelerate the performance of cycle-accurate computer system simulators modeling modern, realistic SoCs, embedded systems and standard desktop/laptop/server computer systems. The methodology partitions a simulator into (i) a <i>functional model</i> that simulates the functionality of… (More)
Reduced or bounded power consumption has become a first-order requirement for modern hardware design. As a design progresses and more detailed information becomes available, more accurate power estimations become possible but at the cost of significantly slower simulation speeds. Power simulation that is both sufficiently-accurate and fast would have a… (More)
for Hotchips 2006
This paper introduces column caching, a exible mechanism that allows software to dynamically customize cache behavior through ne-grain control of its placement policy. For a set-associative cache, speciic data can berestricted to a subset of the usual target cache set during replacement. Through this simple enhancement, column caching enables the cache to… (More)
Fast and accurate simulation of multicore systems requires a parallelized simulator. This paper describes a novel method to build cycle-accurate-capable and parallelizable functional-first simulators of multicore targets.
Graphics Processing Units (GPUs) have numerous configuration and design options, including core frequency, number of parallel compute units (CUs), and available memory bandwidth. At many stages of the design process, it is important to estimate how application performance and power are impacted by these options. This paper describes a GPU performance and… (More)