Learn More
A proposed performance model for superscalar processors consists of: 1) a component that models the relationship between instructions issued per cycle and the size of the instruction window under ideal conditions; and 2) methods for calculating transient performance penalties due to branch mispredictions, instruction cache misses, and data cache misses.(More)
The activity within a processor following a cache miss is studied via a series of simulation experiments. This is a preliminary step toward developing ways of mitigating data cache miss penalties, especially for long misses. With a modest sized reorder buffer (ROB) of 64 entries , structural blockages due to a full ROB are the major cause of the cache miss(More)
Analytical modeling is applied to the automated design of application-specific superscalar processors. Using an analytical method bridges the gap between the size of the design space and the time required for detailed cycle-accurate simulations. The proposed design framework takes as inputs the design targets (upper bounds on execution time, area, and(More)
A common way of representing processor performance is to use Cycles per Instruction (CPI) `stacks' which break performance into a baseline CPI plus a number of individual miss event CPI components. CPI stacks can be very helpful in gaining insight into the behavior of an application on a given microprocessor; consequently, they are widely used by software(More)
A mechanistic model for out-of-order superscalar processors is developed and then applied to the study of microarchitecture resource scaling. The model divides execution time into intervals separated by disruptive miss events such as branch mispredictions and cache misses. Each type of miss event results in characterizable performance behavior for the(More)
Front-end instruction delivery accounts for a significant fraction of the energy consumed in a dynamic superscalar processor. The issue queue in these processors serves two crucial roles: it bridges the front and back ends of the processor and serves as the window of instructions for the out-of-order engine. A mismatch between the front end producer rate(More)
Just-In-Time instruction delivery is a general method for saving energy in a microprocessor by dynamically limiting the number of in-flight instructions. The goal is to save energy by 1) fetching valid instructions no sooner than necessary, avoiding cycles stalled in the pipeline -- especially the issue queue, and 2) reducing the number of fetches and(More)
Many studies point to the difficulty of scaling existing computer architectures to meet the needs of an exascale system (i.e., capable of executing 10 18 floating-point operations per second), consuming no more than 20 MW in power, by around the year 2020. This paper outlines a new architecture, the Active Memory Cube, which reduces the energy of(More)