Learn More
Current on-chip block-centric memory hierarchies exploit access patterns at the fine-grain scale of small blocks. Several recently proposed techniques for coherence traffic reduction and prefetching suggest that further useful patterns emerge with a macroscopic, coarse-grain view. To exploit coarse- grain behavior, previous work extended conventional caches(More)
Using full-custom layouts in 130 nm technology, this work studies how the latency and energy of a checkpointed, CAM-based Register Alias Table (cRAT) vary as a function of the window size, the issue width, and the number of embedded global checkpoints (GCs). These results are compared to those of the SRAM-based RAT (sRAT). Understanding these variations is(More)
We study the energy, latency and area characteristics of two Counting Bloom Filter implementations using full custom layouts in a commercial 0.13μm technology. The first implementation, S-CBF, uses an SRAM array of counts and a shared counter. The second, L-CBF, utilizes an array of up/down linear feedback shift registers. Circuit level simulations(More)
We present two full-custom implementations of the Register Alias Table (RAT) for a 4-way superscalar dynamically-scheduled processor in a commercial 130nm CMOS technology. The implementations differ in the way they organize the embedded global checkpoints (GCs) which support speculative execution. In the first implementation, representative of early(More)
—This paper investigates how the latency and energy of register alias tables (RATs) vary as a function of the number of global checkpoints (GCs), processor issue width, and window size. It improves upon previous RAT checkpointing work that ignored the actual latency and energy tradeoffs and focused solely on evaluating performance in terms of instructions(More)
—This work studies hardware complexity (physical level characteristics) of the recently proposed compacted matrix instruction scheduler for dynamically scheduled, superscalar processors. Previous work focused on the matrix scheduler's architecture and argued in support of its speed and scalability advantages; however, neither actual physical-level(More)
  • 1