This paper describes FAST, a novel simulation methodology that can produce simulators that (i) are orders of magnitude faster than comparable simulators, (ii) are cycle-accurate, (iii) model the entire system running unmodified applications and operating systems, (iv) provide visibility with minimal simulation performance impact and (v) are capable of… (More)
Datacenter workloads demand high computational capabilities, flexibility, power efficiency, and low cost. It is challenging to improve all of these factors simultaneously. To advance datacenter capabilities beyond what commodity server designs can provide, we designed and built a composable, reconfigurable hardware fabric based on field programmable gate… (More)
for Hotchips 2006
This paper describes the FAST methodology that enables a single FPGA to accelerate the performance of cycle-accurate computer system simulators modeling modern, realistic SoCs, embedded systems and standard desktop/laptop/server computer systems. The methodology partitions a simulator into (i) a <i>functional model</i> that simulates the functionality of… (More)
—Fast and accurate simulation of multicore systems requires a parallelized simulator. This paper describes a novel method to build parallelizable and cycle-accurate-capable functional-first simulators of multicore targets.
Hardware-design languages typically impose a rigid communication hierarchy that follows module instantiation. This leads to an undesirable side-effect where changes to a child's interface result in changes to the parents. Soft connections address this problem by allowing the user to specify connection endpoints that are automatically connected at… (More)
—Graphics Processing Units (GPUs) have numerous configuration and design options, including core frequency, number of parallel compute units (CUs), and available memory bandwidth. At many stages of the design process, it is important to estimate how application performance and power are impacted by these options. This paper describes a GPU performance and… (More)
We propose a way to improve the performance of embedded processors running data-intensive applications by allowing software to allocate on-chip memory on an application-specific basis. On-chip memory in the form of cache can be made to act like scratch-pad memory via a novel hardware mechanism, which we call <italic>column caching.</italic> Column caching… (More)