Learn More
Power dissipation is unevenly distributed in modern microprocessors leading to localized hot spots with significantly greater die temperature than surrounding cooler regions. Excessive junction temperature reduces reliability and can lead to catastrophic failure. We examine the use of activity migration which reduces peak junction temperature by moving(More)
—An 8 Gb 4-stack 3-D DDR3 DRAM with through-Si-via is presented which overcomes the limits of conventional modules. A master-slave architecture is proposed which decreases the standby and active power by 50 and 25%, respectively. It also increases the I/O speed to 1600 Mb/s for 4 rank/module and 2 module/channel case since the master isolates all chip I/O(More)
This paper explores the power implications of replacing global chip wires with an on-chip network. We optimize network links by varying repeater spacing, link pipelining, and voltage scaling, to significantly reduce the energy to send a bit across chip. We develop an analytic model of large chip designs with an on-chip two-dimensional mesh network and(More)
Leakage power is dominated by critical paths, and hence dynamic deactivation of fast transistors can yield large savings. We introduce metrics for comparing fine-grain dynamic deactivation techniques that include the effects of deactivation energy and startup latencies, as well as long-term leakage current. We present a new circuit-level technique for(More)
A Leakage-Biased Domino circuit family is proposed that maintains high speed in active mode but which can be rapidly placed into a low-leakage inactive state by using leakage currents themselves to bias internal nodes. A 32-bit Han-Carlson domino adder circuit is used to compare LB-Domino with conventional single and dual Vt domino circuits. For equal delay(More)
paper presents new techniques to evaluate the energy and delay of flip-flop and latch designs and shows that no single existing design performs well across the wide range of operating regimes present in complex systems. We propose the use of a selection of flip-flop and latch designs, each tuned for different activation patterns and speed requirements. We(More)
SyCHOSys (Synchronous Circuit Hardware Orchestration System) generates high-speed energy-performance cycle simulators by compiling a processor description into efficient C++ code. This framework can custom compile a cycle simulator with arbitrary mixed levels of simulation detail ranging from gate-level to purely behavioral models. In addition, SyCHOSys can(More)
This thesis proposes vector-thread architectures as a performance-efficient solution for all-purpose computing. The VT architectural paradigm unifies the vector and multithreaded compute models. VT provides the programmer with a control processor and a vector of virtual processors. The control processor can use vector-fetch commands to broadcast(More)
Different flip-flop designs vary in the number and complexity of logic stages they contain, and hence have different inherent parasitic delays and output drive strengths. We examine the effect of electrical load on flip-flop delay and energy consumption and show that the relative ranking of optimized flip-flop structures varies widely with both electrical(More)