Learn More
Higher global bandwidth requirement for many applications and lower network cost have motivated the use of the Dragonfly network topology for high performance computing systems. In this paper we present the architecture of the Cray Cascade system, a distributed memory system based on the Dragonfly [1] network topology. We describe the structure of the(More)
A parallel sorting algorithm for sorting n elements evenly distributed over Zd = p nodes of a d-dimensional hyper-cube is presented. The average running time of the algorithm is O((n log n)/p + p log2 n). The algorithm maintains a perfect load balance in the nodes by determining the (kn/p)th elements (k = 1,.. . , (p-1)) of the final sorted list in advance.(More)
This paper describes the system architecture of the Cray BlackWidow scalable vector multiprocessor. The BlackWidow system is a distributed shared memory (DSM) architecture that is scalable to 32K processors, each with a 4-way dispatch scalar execution unit and an 8-pipe vector unit capable of 20.8 Gflops for 64-bit operations and 41.6 Gflops for 32-bit(More)
In this paper, we propose logic simulation techniques using parallel and vector machines to reduce simulation time of large digital circuits. Three algorithms for logic simulation have been developed and implemented on the Cray Y-iWP supercomputer, a gen-eralpurpose shared-memory parallel machine with vector processors. The jirst algorithm is a vector(More)
In this paper, we present algorithms for logic and fault simulation, developed and implemented on the Cray Y-MP supercomputer, a general purpose shared-memo y parallel machine with ,vector processors. The parallel-and-vector version of the event-driven logic simulation algorithm achieves a speedup of 52 on the Cray Y-MP with 8 processors, with a maximum(More)
  • 1