Micro-operation cache: a power aware frontend for variable instruction length ISA

  title={Micro-operation cache: a power aware frontend for variable instruction length ISA},
  author={B. Solomon and A. Mendelson and D. Orenstien and Y. Almog and R. Ronen},
  journal={ISLPED'01: Proceedings of the 2001 International Symposium on Low Power Electronics and Design (IEEE Cat. No.01TH8581)},
  • B. Solomon, A. Mendelson, +2 authors R. Ronen
  • Published 2001
  • ISLPED'01: Proceedings of the 2001 International Symposium on Low Power Electronics and Design (IEEE Cat. No.01TH8581)
Introduces the micro-operation cache (Uop Cache-UC) designed to reduce the processor's frontend power and energy consumption without performance degradation. The UC caches basic blocks of instructions pre-decoded into micro-operations (uops). The UC fetches a single basic-block worth of uops per cycle. Fetching complete pre-decoded basic-blocks eliminates the need to repeatedly decode variable length instructions and simplifies the process of predicting, fetching, rotating and aligning fetched… Expand

Figures from this paper

A Complexity-Effective Decoding Architecture Based on Instruction Streams
A complex decoding logic is a performance bottleneck for those high-frequency microprocessors that implement variable length instruction set architectures. The need of removing this complexity fromExpand
Using a serial cache for energy efficient instruction fetching
This paper examines a high-bandwidth fetch architecture augmented with an instruction cache way predictor and shows that a serial fetch architecture achieves approximately the same energy reduction and performance as way prediction architectures, without the added structures and recovery complexity. Expand
High Performance and Energy Efficient Serial Prefetch Architecture
This design, called Serial Prefetching, combines a high fetch bandwidth branch prediction and efficient instruction prefetching architecture with a low-energy instruction cache and explores the benefit of decoupling the tag component of the cache from the data component. Expand
Sharing the instruction cache among lean cores on an asymmetric CMP for HPC applications
This paper analyzes the performance, power and area impact of such a design on an ACMP with one high-performance core and multiple low-power cores and finds the sweet spot on a wide interconnect to access the shared I-cache and the inclusion of a few line buffers to provide the required bandwidth and latency to sustain performance. Expand
Lazy Retirement: A Power Aware Register Management Mechanism
In this paper we describe "Lazy Retirement" a poweraware improvement to the Intel’s P6 family microarchitecture. Lazy Retirement significantly reduces the energy and power involved in registerExpand
Power awareness through selective dynamically optimized traces
It is shown that the PARROT based microarchitecture can improve the performance of aggressively designed processors by providing the means to improve the utilization of their more elaborate resources and provides the key to attenuating increases in the power budget. Expand
DynaMOS: Dynamic schedule migration for heterogeneous cores
DynaMOS provisions little with an OinO mode to replay a speculative schedule while ensuring program correctness, and schedules 38% of execution on the little on average, increasing utilization of the energy-efficient core by 2.9X over prior work. Expand
Multicore architecture optimizations for HPC applications
This thesis explores HPC-specific optimizations in order to make better utilization of the available transistors and to improve performance by transparently executing parallel code across multiple GPU accelerators, and investigates multi-socket NUMA GPU designs and shows that significant changes are needed to both the GPU interconnect and cache architectures to achieve performance scalability. Expand
Synthesis Lectures on Computer Architecture
This book aims to document some of the most important architectural techniques that were invented, proposed, and applied to reduce both dynamic power and static power dissipation in processors and memory hierarchies by focusing on their common characteristics. Expand
Performance Analysis of Complex Shared Memory Systems
It is shown that the proposed methodology for the identification of meaningful hardware performance counters yields useful metrics for the localization of memory related performance limitations. Expand


Alternative fetch and issue policies for the trace cache fetch mechanism
A performance comparison between a trace cache implementing partial matching and inactive issue and an aggressive single block fetch mechanism and the trace cache increases performance by an average of 25% over the instruction cache. Expand
eXtended block cache
This paper describes a new instruction-supply mechanism, called the eXtended Block Cache (XBC). The goal of the XBC is to improve on the Trace Cache (TC) hit rate, while providing the same bandwidth.Expand
Pipeline gating : [ Frie 97 ] [ JourOO ] [ Pele 94 ] [ Rote 961 [ UptoOO ] Speculation control for energy reduction ”