A 32nm 3.1 billion transistor 12-wide-issue Itanium® processor for mission-critical servers

@article{Riedlinger2011A33,
  title={A 32nm 3.1 billion transistor 12-wide-issue Itanium{\textregistered} processor for mission-critical servers},
  author={Reid J. Riedlinger and Rohit Bhatia and Larry L. Biro and William J. Bowhill and Eric S. Fetzer and Paul E. Gronowski and Tom Grutkowski},
  journal={2011 IEEE International Solid-State Circuits Conference},
  year={2011},
  pages={84-86}
}
The next generation in the Intel<sup>®</sup> Itanium<sup>®</sup> processor family, code named Poulson, has eight multi-threaded 64 bit cores. Poulson is socket compatible with the current Intel® Itanium<sup>®</sup> Processor 9300 series (Tukwila). The new design integrates a ring-based system interface derived from portions of previous Xeon<sup>®</sup> and Itanium<sup>®</sup> processors, and includes 32MB of Last Level Cache (LLC). The processor is designed in Intel<sup>®</sup>'s 32nm CMOS… 
A 32 nm, 3.1 Billion Transistor, 12 Wide Issue Itanium® Processor for Mission-Critical Servers
TLDR
An Itanium® processor implemented in 32 nm CMOS with nine layers of Cu contains 3.1 billion transistors and has eight multi-threaded cores, a ring based system interface and combined cache on the die is 50 MB.
A 667 MHz Logic-Compatible Embedded DRAM Featuring an Asymmetric 2T Gain Cell for High Speed On-Die Caches
TLDR
Circuit techniques for enhancing the retention time and random cycle of logic-compatible embedded DRAMs (eDRAMs) are presented and a half-swing write bit-line (WBL) scheme is adopted to improve the WBL speed and reduce its power dissipation during write-back operation.
Harnessing Voltage Margins for Energy Efficiency in Multicore CPUs
TLDR
This paper presents the first automated system-level analysis of multicore CPUs based on ARMv8 64-bit architecture when pushed to operate in scaled voltage conditions and proposes a new composite metric (severity) that aggregates the behavior of cores when undervolted and can support system operation and design protection decisions.
Respin: Rethinking Near-Threshold Multiprocessor Design with Non-volatile Memory
TLDR
This paper presents an architecture that rethinks the cache hierarchy in near-threshold multiprocessors, and proposes a hardware-based core management system that dynamically consolidates virtual cores into variable numbers of physical cores to increase resource efficiency.
Improving the Reliability of Microprocessors under BTI and TDDB Degradations
Reliability is a fundamental challenge for current and future microprocessors with advanced nanoscale technologies. With smaller gates, thinner dielectric and higher temperature microprocessors are
Cocoa: synergistic cache compression and error correction in capacity sensitive last level caches
TLDR
A novel technique to enable reliable low voltage operation and preserve capacity for LLCs is proposed, synergistic cache compression and error correction (Cocoa), and a new ECC scheme is introduced to minimize the space overhead of ECC on a per-segment basis.
Energy-Efficient SRAM Design in 28 nm FDSOI Technology
As CMOS scaling continues to sub-32nm regime, the effects of device variations become more prominent. This is very critical in SRAMs, which use very small transistor dimensions to achieve high memory
Dynamic reduction of voltage margins by leveraging on-chip ECC in Itanium II processors
TLDR
This paper presents a new mechanism for dynamically reducing voltage margins while maintaining the chip operating frequency constant, and uses correctable error reports raised by the hardware to identify the lowest, safe operating voltage.
Energy-aware system design using circuit reconfigurability with a focus on low-power SRAMs
TLDR
This thesis focuses on an energy monitoring circuit design that can generate a digital representation of the absolute energy per operation of a circuit that is extended to a processor system for system-level power and performance optimizations.
On-chip networks for manycore architecture
TLDR
Three techniques for improving the efficiency of on- chip interconnects are presented and ENC (Exclusive Native Context), the first deadlock-free, fine-grained thread migration protocol developed for on-chip networks, is presented.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 11 REFERENCES
A 65 nm 2-Billion Transistor Quad-Core Itanium Processor
TLDR
This paper describes an Itanium processor implemented in 65 nm process with 8 layers of Cu interconnect, which has four dual-threaded cores, 30 MB of cache, and a system interface that operates at 2.4 GHz at 105degC.
The 65-nm 16-MB Shared On-Die L3 Cache for the Dual-Core Intel Xeon Processor 7100 Series
TLDR
The 16-way set associative, single-ported 16-MB cache for the Dual-Core Intel Xeon Processor 7100 Series uses a 0.624 mum2 cell in a 65-nm 8-metal technology to minimize both leakage and dynamic power.
Measurements and analysis of SER-tolerant latch in a 90-nm dual-V/sub T/ CMOS process
TLDR
The proposed latch can improve reliability of critical sequential logic elements in microprocessors and other circuits by utilizing local redundancy and the effects of the recovery time, threshold voltage assignment, and leakage on the SER robustness.
Measurements and analysis of SER-tolerant latch in a 90-nm dual-VT CMOS process
TLDR
The proposed latch can improve reliability of critical sequential logic elements in microprocessors and other circuits by utilizing local redundancy and the effects of the recovery time, threshold voltage assignment, and leakage on the SER robustness.
The 16 kB single-cycle read access cache on a next-generation 64 b Itanium microprocessor
TLDR
A 16 kB four-ported physically addressed cache to be placed on a 64 b Itanium microprocessor operates at 1.2 GHz with 19.2 GB/s peak bandwidth to allow a single-cycle read access latency.
Clock distribution on a dual-core, multi-threaded Itanium/sup /spl reg//-family processor
TLDR
A region-based active de-skew system reduces the PVT sources of skew across the entire die during normal operation on the 90 nm Itanium/spl reg/ processor, code-named Montecito, is detailed.
High performance 32nm logic technology featuring 2nd generation high-k + metal gate transistors
A 32nm logic technology for high performance microprocessors is described. 2nd generation high-k + metal gate transistors provide record drive currents at the tightest gate pitch reported for any
A fully-bypassed 6-issue integer datapath and register file on an Itanium microprocessor
  • E. Fetzer, M. Gibson, Baker Mohammad
  • Physics
    2002 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.02CH37315)
  • 2002
A 6-issue integer datapath with a 20-ported 128/spl times/65 bit register file in a 0.18 /spl mu/m process operates up to 1.2 GHz at 1.5 V. Operands bypass through 4 stages, from 34 locations, using
The scaling of data sensing schemes for high speed cache design in sub-0.18 /spl mu/m technologies
Small signal differential data sensing for on-chip cache design is evaluated from the perspective of technology scaling. Maintaining the delay scaling trend and high area efficiency is getting more
Voltage transient detection and induction for debug and test
TLDR
A system which enables voltage transient detection and a capability to induce voltage transients in a controlled manner is described, along with limitations and future options for improvements.
...
1
2
...