Learn More
The performance tradeoff between hardware complexity and clock speed is studied. First, a generic superscalar pipeline is defined. Then the specific areas of register renaming, instruction window wakeup and selection logic, and operand bypassing are analyzed. Each is modeled and Spice simulated for feature sizes of 0.8µm, 0.35µm, and(More)
In conventional superscalar microarchitectures with partitioned integer and floating-point resources, all floating-point resources are idle during execution of integer programs. Palacharla and Smith [26] addressed this drawback and proposed that the floating-point subsystem be augmented to support integer operations. The hardware changes required are(More)
Today's commodity microprocessors require a low latency memory system to achieve high sustained performance. The conventional high-performance memory system provides fast data access via a large secondary cache. But large secondary caches can be expensive, particularly in large-scale parallel systems with many processors (and thus many caches).We evaluate a(More)
This paper explores the complexity of implementing directory protocols by examining their <i>mechanisms</i> primitive operations on directories, caches, and network interfaces. We compare the following protocols: <i>Dir</i><sub>1</sub><i>B</i>, <i>Dir</i><sub>4</sub><i>B</i>, <i>Dir</i><sub>4</sub><i>NB</i>, <i>Dir</i><sub>n</sub><i>NB</i>[2],(More)
  • 1