launched the Cell project 1 in 2000, the design goal was to improve performance an order of magnitude over that of desktop systems shipping in 2005. To meet that goal, designers had to optimize performance against area, power, volume, and cost, but clearly single-core designs offered diminishing returns on investment. 1-3 If increased efficiency was the… (More)
—Resonant clock distributions have the potential to save power by recycling energy from cycle-to-cycle while at the same time improving performance by reducing the clock distribution latency and filtering out non-periodic noise. While these features have been successfully demonstrated in several small-scale experiments, there remained a number of concerns… (More)
The Hector Distributed Run–Time Environment provides a fully integrated run–time environment and scheduling system for MPI programs over networked computer resources. This paper describes the modifications needed to an MPI implementation to make task migration and checkpointing possible, and recent experiments in improved scheduling and optimization. It… (More)
A 32b 4-way SIMD dual-issue synergistic processor element of a CELL processor is developed with 20.9 million transistors in 14.8mm/sup 2/ using a 90nm SOI technology. CMOS static gates implement the majority of the logic. Dynamic circuits are used in critical areas, occupying 19% of the nonSRAM area. ISA, microarchitecture and physical implementation are… (More)
This paper presents a design methodology emphasizing early and quick timing closure for high frequency microprocessor designs. This methodology was used to design a Gigahertz class PowerPC microprocessor with 19 million transistors. Characteristics of “Timing Closure by Design are 1) logic partitioned on timing boundaries, 2) predictable control… (More)
This paper describes the architecture and implementation of the original gaming-oriented synergistic processor element (SPE) in both 90-nm and 65-nm silicon-on-insulator (SOI) technology and introduces a new SPE implementation targeted for the high-performance computing community. The Cell Broadband Enginee processor contains eight SPEs. The dual-issue,… (More)
The first generation cell broadband engine processor introduced the cell architecture that consists of nine processor cores fabricated in the 90 nm CMOS SOI technology. This paper describes the advances made by moving the cell broadband engine design from 90 nm CMOS SOI to 65 nm CMOS SOI.