Learn More
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to(More)
Multidestination message passing has been proposed as an attractive mechanism for efficiently implementing multicast and other collective operations on direct networks. However, applying this mechanism to switch-based parallel systems is non-trivial. In this paper we propose alternative switch architectures with differing buffer organizations to implement(More)
Barrier synchronization is a crucial operation for parallel systems. Many schemes have been proposed in the literature to achieve fast barrier synchronization through software, hardware, or a combination of these mechanisms. However, few of these schemes emphasize fault-tolerant barrier operations. In this paper , we describe inexpensive support that can be(More)
This paper proposes a new approach for implementing fast multicast and broadcast in multistage interconnection networks (MINs) with multiport encoded multidestination worms. For a MIN with k k switches and n stages such worms use n header flits each. One flit is used for each stage of the network and it indicates the output ports to which a multicast(More)
Multidestination message passing has been proposed as a mechanism to achieve eecient multicast in regular direct and indirect networks. The application of this technique to parallel systems based on irregular networks has, however, not been studied. In this paper we propose two schemes for performing multicast using multidestination worms on irregular(More)
Trace-driven simulation is an important aid in performance analysis of computer systems. Capturing address traces for these simulations is a difficult problem for single processors and particularly for multicomputers. Even when existing trace methods can be used on multicomputers, the amount of collected data typically grows with the number of processors,(More)