Learn More
Much of the improvement in computer performance over the last twenty years has come from faster transistors and architectural advances that increase parallelism. Historically, parallelism has been exploited either at the instruction level with a grain-size of a single instruction or by partitioning applications into coarse threads with grain-sizes of(More)
The M{Machine is an experimental multicomputer being developed to test architectural concepts motivated by the constraints of modern semiconductor technology and the demands of programming systems. The M{Machine computing nodes are connected with a 3{D mesh network; each node is a multithreaded processor incorporating 12 function units, on-chip cache, and(More)
Exceptions have traditionally been used to handle infrequently occurring and unpredictable events during normal program execution. Current trends in microprocessor and operating systems design continue to increase the cost of event handling. Because of the deep pipelines and wide out-of-order superscalar architec-tures of contemporary microprocessors, an(More)
Interchip I/O bandwidth is a critical bottleneck in VLSI systems. To make the best use of this resource the conventions and circuits used for inter-chip signaling must be optimized to achieve the maximum bit rate with minimumpower dissipation. This paper describes a set of I/O pads that we have developed at MIT. They operate with small signal levels to(More)
The MMMachine is an experimental multicomputer being developed to test architectural concepts motivated by the constraints of modern semiconductor technology and the demands of programming systems. The MMMachine computing nodes are connected with a 33D mesh network; each n o d e i s a m ultithreaded processor incorporating 12 function units, on-chip cache,(More)
Continuing reductions in on-chip geometries yield increasing numbers of transistors per chip and fundamentally faster devices but also result in effectively slower wires. This combination presents significant challenges for new microprocessor architectures. The disparity in performance between on-chip arithmetic units and memory creates longer effectively(More)
We present a user-level message interface that provides high performance and very low processor overhead. In this system, messages are launched from within the user's general register le, and received in a hardware queue mapped to a general register. A message handler is started within the latency of a jump instruction upon arrival of the rst message word,(More)