Learn More
Current generation DSM systems use point-topoint (unicast) messages for cache invalidations. This incurs a large number of control messages, heavy network traffic, and high occupancy at home nodes. This paper introduces a new approach to reduce these overheads by using multidestination-based reservation and gather worms for distributing invalidation(More)
Most of recent research on distributed shared memory (DSM) systems have focused on either careful design of node controllers or cache coherence protocols. While evaluating these designs, simplified models of networks (constant latency or average latency based on the network size) are typically used. Such models completely ignore network contention. To help(More)
| Modern high performance networks being used for scalable distributed shared memory (DSM) systems support multiple paths to increase bandwidth and/or reduce contention. Such networks violate the constraint of pairwise in-order message delivery implicitly required by many existing directory-based cache coherence protocols. To solve this problem, two(More)
Shared memory multiprocessors play an increasingly important role in enterprise and scientific computing facilities. Remote misses limit the performance of shared memory applications, and their significance is growing as network latency increases relative to processor speeds. This paper proposes two mechanisms that improve shared memory performance by(More)
Components of modern parallel systems are becoming quite complex with many features and variations. An integrated modeling of these components (interconnection network, messaging layer, programming model, and computation-communication characteristics of applications) is essential to derive design guidelines for next generation parallel systems. Most of the(More)
We consider here two basic fault-secure scheduling problems for multiprocessor systems. First, given the number of processors in the system and a set of computational tasks of unit length expressed as a complete binary tree, a scheduling algorithm is proposed such that the total execution time is a minimum and no undetected single error result will be(More)
Networks of workstations (NOWs) are becoming increasingly popular as an alternative to parallel computers. Typically, these networks present irregular topologies, providing the wiring flexibility, scalability, and incremental expansion capability required in this environment. Similar to the evolution of parallel computers, NOWs are also evolving from(More)
Many-core chip multiprocessors can be expected to scale to tens of cores and beyond in the near future. Existing and emerging workloads on general-purpose many-core processors typically exhibit fast-changing, unpredictable on-chip communication traffic full of burstiness and jitters between different functional blocks. To provide high sustainable(More)
Components of modern parallel systems are becoming quite complex with many features and variations. An integrated modeling of these components (interconnection network, messaging layer, programming model, and computation-communication characteristics of applications) is essential to derive design guidelines for next generation parallel systems. Most of the(More)