• Publications
  • Influence
Cray Cascade: A scalable HPC system based on a Dragonfly network
TLDR
This paper presents the architecture of the Cray Cascade system, a distributed memory system based on the Dragonfly network topology, and describes a set of advanced features supporting both mainstream high performance computing applications and emerging global address space programing models. Expand
The Cray BlackWidow: a highly scalable vector multiprocessor
TLDR
The BlackWidow system is a distributed shared memory architecture that is scalable to 32K processors, each with a 4-way dispatch scalar execution unit and an 8-pipe vector unit capable of 20.8 Gflops. Expand
Vector instruction set support for conditional operations
TLDR
An approach using masked operations is shown to be one of the better methods, especially if its implementation is able to skip over blocks of false mask bits, and a practical implementation of masked operations that skips over power-of-2-length blocks offalse values is concluded. Expand
Cache performance in vector supercomputers
TLDR
The data fetch strategy is found to be a significant parameter affecting performance, the performance of several fetch policies are evaluated, and it is shown that small fetch sizes improve performance by maximizing the use of available memory bandwidth. Expand
A CMOS Vector Processor with a Custom Streaming Cache