Brian K. Flachs

Learn More
When IBM, Sony, and Toshiba launched the Cell project in 2000, the design goal was to improve performance an order of magnitude over that of desktop systems shipping in 2005. To meet that goal, designers had to optimize performance against area, power, volume, and cost, but clearly single-core designs offered diminishing returns on investment. If increased(More)
Resonant clock distributions have the potential to save power by recycling energy from cycle-to-cycle while at the same time improving performance by reducing the clock distribution latency and filtering out non-periodic noise. While these features have been successfully demonstrated in several small-scale experiments, there remained a number of concerns(More)
implementation of the synergistic processor in 65-nm and 90-nm SOI B. Flachs S. Asano S. H. Dhong H. P. Hofstee G. Gervais R. Kim T. Le P. Liu J. Leenstra J. S. Liberty B. Michael H.-J. Oh S. M. Mueller O. Takahashi K. Hirairi A. Kawasumi H. Murakami H. Noro S. Onishi J. Pille J. Silberman S. Yong A. Hatakeyama Y. Watanabe N. Yano D. A. Brokenshire M.(More)
A 32b 4-way SIMD dual-issue synergistic processor element of a CELL processor is developed with 20.9 million transistors in 14.8mm/sup 2/ using a 90nm SOI technology. CMOS static gates implement the majority of the logic. Dynamic circuits are used in critical areas, occupying 19% of the nonSRAM area. ISA, microarchitecture and physical implementation are(More)
This paper presents a design methodology emphasizing early and quick timing closure for high frequency microprocessor designs. This methodology was used to design a Gigahertz class PowerPC microprocessor with 19 million transistors. Characteristics of “Timing Closure by Design are 1) logic partitioned on timing boundaries, 2) predictable control(More)
The cell BE design methodology is described which enables high frequency, high performance, power efficient, and area optimized design. It includes a hierarchical design style, clean clock boundary, effective use of non-scan latches, at-speed scan testing, custom design like synthesized macro, fine grained clock gating scheme, and cycle accurate power(More)
The Hector Distributed Run–Time Environment provides a fully integrated run–time environment and scheduling system for MPI programs over networked computer resources. This paper describes the modifications needed to an MPI implementation to make task migration and checkpointing possible, and recent experiments in improved scheduling and optimization. It(More)
Many institutions already have networks of workstations, which could potentially be harnessed as a powerful parallel processing resource. A new, automatic task allocation system has been built on top of MPI, an environment that permits parallel programming by using the message–passing paradigm and implemented in extensions to C and FORTRAN. This system,(More)