Learn More
Evolving technology and increasing pin-bandwidth motivate the use of high-radix routers to reduce the diameter, latency, and cost of interconnection networks. High-radix networks, however, require longer cables than their low-radix counterparts. Because cables dominate network cost, the number of cables, and particularly the number of long, global cables(More)
The CRAY T3E is a scalable shared-memory multiprocessor based on the DEC Alpha 21164 microprocessor. The system includes a number of novel architectural features designed to tolerate latency, enhance scalability, and deliver high performance on scientific and engineering codes. Included among these are <b>stream buffers,</b> which detect and prefetch down(More)
This paper investigates a complexity-effective technique for verifying a highly distributed directory-based cache coherence protocol. We develop a novel approach called " witness strings " that combines both formal and informal verification methods to expose design errors within the cache coherence protocol and its Verilog implementation. In this approach a(More)
Evolving technology and increasing pin-bandwidth motivate the use of high-radix routers to reduce the diameter, latency, and cost of interconnection networks. This migration from low-radix to high-radix routers is demonstrated with the recent introduction of high-radix routers and they are expected to impact networks used in large-scale systems such as(More)
This paper describes the radix-64 folded-Clos network of the Cray BlackWidow scalable vector multiprocessor. We describe the BlackWidow network which scales to 32K processors with a worstcase diameter of seven hops, and the underlying high-radix router microarchitecture and its implementation. By using a high-radix router with many narrow channels we are(More)
This paper describes the system architecture of the Cray BlackWidow scalable vector multiprocessor. The BlackWidow system is a distributed shared memory (DSM) architecture that is scalable to 32K processors, each with a 4-way dispatch scalar execution unit and an 8-pipe vector unit capable of 20.8 Gflops for 64-bit operations and 41.6 Gflops for 32-bit(More)
This paper looks at work done on a case-based workforce scheduling application. Generalised patterns of workforce allocation are used to build up a schedule that is then adjusted to remove any problems. Because some constraint elements are incorporated in the case-base, the global problem search space is reduced. The case-base can be maintained either(More)
M ore powerful processors and an increasing emphasis on networking have created the need for a new supercomputer interconnect. This interconnect must of course provide high band-width, but also high reliability, flexibility, and scalability of both cost and performance. Cray Research has developed the GigaRing channel to satisfy these demands. GigaRing is a(More)
Modern large-scale multiprocessors, capable of scaling to hundreds or thousands of processors, have proven to be very difficult to design and verify in a timely manner. In particular , the verification process, i.e., proving that the design is functionally correct, is often the most time-consuming aspect of developing the system. This paper discusses the(More)
Accurately estimating congestion for proper global adaptive routing decisions (i.e., determine whether a packet should be routed minimally or non-minimally) has a significant impact on overall performance for high-radix topologies, such as the Dragonfly topology. Prior work have focused on understanding near-end congestion - i.e., congestion that occurs at(More)