Learn More
Current on-chip block-centric memory hierarchies exploit access patterns at the fine-grain scale of small blocks. Several recently proposed techniques for coherence traffic reduction and prefetching suggest that further useful patterns emerge with a macroscopic, coarse-grain view. To exploit coarse- grain behavior, previous work extended conventional caches(More)
We present two full-custom implementations of the Register Alias Table (RAT) for a 4-way superscalar dynamically-scheduled processor in a commercial 130nm CMOS technology. The implementations differ in the way they organize the embedded global checkpoints (GCs) which support speculative execution. In the first implementation, representative of early(More)
Using full-custom layouts in 130 nm technology, this work studies how the latency and energy of a checkpointed, CAM-based Register Alias Table (cRAT) vary as a function of the window size, the issue width, and the number of embedded global checkpoints (GCs). These results are compared to those of the SRAM-based RAT (sRAT). Understanding these variations is(More)
We study the energy, latency and area characteristics of two Counting Bloom Filter implementations using full custom layouts in a commercial 0.13μm technology. The first implementation, S-CBF, uses an SRAM array of counts and a shared counter. The second, L-CBF, utilizes an array of up/down linear feedback shift registers. Circuit level simulations(More)
This paper investigates how the latency and energy of register alias tables (RATs) vary as a function of the number of global checkpoints (GCs), processor issue width, and window size. It improves upon previous RAT checkpointing work that ignored the actual latency and energy tradeoffs and focused solely on evaluating performance in terms of instructions(More)
This work studies physical-level characteristics of the recently proposed compacted matrix instruction scheduler for dynamically-scheduled, superscalar processors. Previous work focused on the matrix scheduler's architecture and argued in support of its speed and scalability advantages. However, no physical-level implementation or models were reported for(More)
Register renaming is a performance-critical component of modern, dynamically-scheduled processors. Register renaming latency increases as a function of several architectural parameters (e.g., processor issue width, processor window size, and processor checkpoint count). Pipelining of the register renaming logic can help avoid restricting the processor clock(More)
  • 1