Dynamically Trading Frequency for Complexity in a GALS Microprocessor

@article{Dropsho2004DynamicallyTF,
  title={Dynamically Trading Frequency for Complexity in a GALS Microprocessor},
  author={Steven G. Dropsho and Greg Semeraro and David H. Albonesi and Grigorios Magklis and Michael L. Scott},
  journal={37th International Symposium on Microarchitecture (MICRO-37'04)},
  year={2004},
  pages={157-168}
}
  • S. Dropsho, G. Semeraro, M. Scott
  • Published 4 December 2004
  • Computer Science
  • 37th International Symposium on Microarchitecture (MICRO-37'04)
Microprocessors are traditionally designed to provide "best overall" performance across a wide range of applications and operating environments. Several groups have proposed hardware techniques that save energy by "downsizing" hardware resources that are underutilized by the current application phase. Others have proposed a different energy-saving approach: dividing the processor into domains and dynamically changing the clock frequency and voltage within each domain during phases when the full… 
Dynamic MIPS rate stabilization in out-of-order processors
TLDR
This paper demonstrates that the execution time of an OoO processor can be stable and predictable by controlling its MIPS rate via a PID (Proportional, Integral, and Differential gain) feedback controller and DVFS (Dynamic Voltage and Frequency Scaling).
Dynamic MIPS Rate Stabilization for Complex Processors
TLDR
The execution time of in-order, out-of-order (OoO), and OoO simultaneous multithreaded processors can be stable and predictable by stabilizing their mega instructions executed per second (MIPS) rate via a proportional, integral, and differential (PID) gain feedback controller and dynamic voltage and frequency scaling (DVFS).
Drowsy Cache Partitioning for Multithreaded Systems and High Level Caches
TLDR
A phase adaptive cache that will reduce both static and dynamic power while having very little impact on the performance is implemented for all three levels of cache in a multicore architecture.
Alternative Timing in Digital Logic
TLDR
This work proposes that alternative timing schemes have as yet untapped potential and warrant further industry focus and research, and examines existing and new ideas in circuit timing, with a focus on microprocessors.
Simulating a LAGS processor to consider variable latency on L1 D-Cache
TLDR
A Locally-Asynchronous Globally-Synchronous (LAGS) superscalar microarchitecture in which read operations on a variable latency L1 data cache are managed through an asynchronous wrapper, which has the feasibility running SPEC2000 benchmarks and is presented as an asynchronous approach to improve processor performance using this feature.
A workload adaptive voltage scaling multiple clock domain architecture
This thesis presents a comprehensive system for allowing a Multiple Clock Domain (MCD) processor to adapt to its workload in an efficient manner. We present adaptive techniques at both the
Power reduction techniques for microprocessor systems
TLDR
It is concluded that power management is a multifaceted discipline that is continually expanding with new techniques being developed at every level and it remains too early to tell which techniques will ultimately solve the power problem.
Drowsy cache partitioning for reduced static and dynamic energy in the cache hierarchy
TLDR
This work proposes the use of a phase adaptive cache design to reduce both leakage and dynamic power consumption with very little impact on the overall performance and test the design on a private second level cache.
Design of a distributed memory unit for clustered microarchitectures
TLDR
A detailed study is presented that compares the proposed distributed memory unit to a centralized memory unit and confirms its advantages of reduced energy usage and of improved performance.
Effective management of multiple configurable units using dynamic optimization
TLDR
This paper proposes an ACE management framework for efficient management of multiple CUs, utilizing dynamic optimization systems' inherent capabilities of detecting and optimizing program hotspots, i.e., dominate code regions, and develops a scheme where hotpot boundaries are used for phase detection and adaptation.
...
1
2
3
...

References

SHOWING 1-10 OF 43 REFERENCES
Energy-efficient processor design using multiple clock domains with dynamic voltage and frequency scaling
TLDR
An alternative approach is described, which is called a multiple clock domain (MCD) processor, in which the chip is divided into several clock domains, within which independent voltage and frequency scaling can be performed.
Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor
TLDR
Experimental results indicate that the profile-driven approach is more stable than hardware-based reconfiguration, and yields virtually all of the energy-delay improvement achieved via offline analysis.
Dynamic frequency and voltage control for a multiple clock domain microarchitecture
TLDR
An on-line algorithm to dynamically control the frequency/voltage of a Multiple Clock Domain (MCD) microarchitecture is described, allowing energy savings when the frequency of some regions can be reduced without significantly impacting performance.
Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor
TLDR
Experimental results indicate that the profile-driven approach is more stable than hardware-based reconfiguration, and yields virtually all of the energy-delay improvement achieved via off-line analysis.
Integrating adaptive on-chip storage structures for reduced dynamic power
TLDR
This work introduces a novel cache design that permits direct calculation of efficient configurations for buffer and queue structures and shows energy savings of up to 70% on the individual structures, and savings averaging 30% overall for the portion of energy attributed to these structures.
Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures
TLDR
This paper proposes a cache and TLB layout and design that leverages repeater insertion to provide dynamic low-cost configurability trading off size and speed on a per application phase basis and demonstrates that a configurable L2/L3 cache hierarchy coupled with a conventional LI results in an average 43% reduction in memory hierarchy energy in addition to improved performance.
Adapting Processor Supply Voltage to Instruction-Level Parallelism
TLDR
The technique monitors a program’s instruction-level parallelism (ILP) and adjusts processor voltage and speed in response to the amount of observed ILP, which improves energy consumption by an average of 47%.
Hiding synchronization delays in a GALS processor microarchitecture
TLDR
It is shown that by adding out-of-order superscalar execution capabilities to a simpler microarchitecture, such as an Intel StrongARM-like processor, as much as 62% of the performance degradation caused by synchronization delays can be eliminated.
Interfacing synchronous and asynchronous modules within a high-speed pipeline
  • A. E. Sjogren, C. Myers
  • Computer Science, Engineering
    Proceedings Seventeenth Conference on Advanced Research in VLSI
  • 1997
TLDR
This paper describes a new technique for integrating asynchronous modules within a high-speed synchronous pipeline by using a clock generated by a stoppable ring oscillator, which is capable of driving the large clock load found in present day microprocessors.
Compiler-Directed Dynamic Frequency and Voltage Scheduling
TLDR
A compilation strategy is discussed that identifies opportunities for dynamic voltage and frequency scaling of the CPU without significant increase in overall program execution time by introducing a simple, yet effective performance model to determine an efficient CPU slowdown factor for memory bound loop computations.
...
1
2
3
4
5
...