An argument for simple COMA
@article{Saulsbury1995AnAF, title={An argument for simple COMA}, author={Ashley Saulsbury and Tim Wilkinson and John B. Carter and Anders Landin}, journal={Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture}, year={1995}, pages={276-285} }
We present design details and some initial performance results of a novel scalable shared memory multiprocessor architecture. This architecture features the automatic data migration and replication capabilities of cache-only memory architecture (COMA) machines, without the accompanying hardware complexity. A software layer manages cache space allocation at a page-granularity-similarly to distributed virtual shared memory (DVSM) systems, leaving simpler hardware to maintain shared memory…
49 Citations
Cache-Only Memory Architectures
- Computer ScienceComputer
- 1999
The authors explain the functionality, architecture, performance, and complexity of COMA systems, which compare COMA to traditional nonuniform memory access (NUMA) systems, and describe proposed improvements in NUMA systems that target the same performance obstacles as COMA.
Research Feature Cache-Only Memory Architectures
- Computer Science
The functionality, architecture, performance, and complexity of COMA systems are explained, different COMA designs are outlined, COMA to traditional cache-coherent non-uniform memory access (NUMA) systems are compared, and proposed improvements in NUMA systems that target the same performance obstacles as COMA are described.
The Impact of Memory Organization in Hybrid DSM
- Computer Science
- 1997
This study compares the design issues and performance consequences for adopting in hybrid DSM four memory organizations inspired from existing architectures: CC-NUMA, RCNUma, S-COMA, and COMA.
PRISM: an integrated architecture for scalable shared memory
- Computer ScienceProceedings 1998 Fourth International Symposium on High-Performance Computer Architecture
- 1998
Adaptive, run-time policies that take advantage of PRISM's ability to dynamically configure shared memory pages with different behaviors significantly outperform pure CC-NUMA or Simple-COMA configurations and are usually within 10% of optimal performance.
Reducing Remote Con ict Misses in Shared-Memory Multiprocessors: NUMA with Remote Cache and COMA
- Computer Science
- 2007
To compare the performance of the two organizations for the same amount of total memory, a model of data sharing is introduced that uses three data sharing patterns: replication, read-mostly migration, and read-write migration.
2 CC-NUMA and COMA-F Architectures
- Computer Science
- 2002
Distributed shared memory multiprocessors with cache coherent non-uniform memory architectures (CC-NUMA) have become popular in the memory design of multiprocessors in recent years. The shared data…
Evaluating the Memory Performance of a ccNUMA System
- Computer Science
- 2000
This work presents a detailed memory performance analysis of a particular ccNUMA system (the SGI Origin 2000) and presents a new memory profiling tool and a new set of microbenchmark codes, called snbench, which make such a fine-grained memoryperformance analysis possible.
964 COMA : AN OPPORTUNITY FOR BUILDING FAULT-TOLERANT SCALABLE SHARED MEMORY MULTIPROCESSORS
- Computer Science
- 1995
The class of Cache Only Memory Architectures (COMA) are good candidates for building fault-tolerant SSMMs and a backward error recovery strategy can be implemented without signiicant hardware modiication to previously proposed COMA by exploiting their standard replication mechanisms and extending the coherence protocol to transparently manage recovery data.
Simple COMA Shared Memory and the RS / 6000 SP White Paper
- Computer Science
- 1996
The Simple COMA shared memory architecture and the potential implementation of this architecture on the IBM RS/6000 SP parallel computer are explained.
A Dual Address Space Architecture: Implementation and Evaluation
- Computer Science
- 2009
This dissertation proposes changes to a hardware-based DSM architecture that allow users to use two address spaces to gain the scalability of distributed architectures while retaining the benefits of the shared address space architecture.
References
SHOWING 1-10 OF 26 REFERENCES
Simple COMA node implementations
- Computer Science1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences
- 1994
The authors introduce the idea of a simple COMA architecture, a hybrid with hardware support only for the functionality frequently used, and because of its simplicity it should be quick and cheap to develop and engineer.
Experimental comparison of memory management policies for NUMA multiprocessors
- Computer ScienceTOCS
- 1991
The results show that there are memory management policies implemented in the system that can improve the performance of programs written using the simpler uniform memory access (UMA) programming model, and there appears to be no single policy that can be considered the best over a set of test applications.
The directory-based cache coherence protocol for the DASH multiprocessor
- Computer ScienceISCA '90
- 1990
The design of the DASH coherence protocol is presented and how it addresses the issues of correctness, performance and protocol complexity are discussed and compared to the IEEE Scalable Coherent Interface protocol.
Memory coherence in shared virtual memory systems
- Computer ScienceTOCS
- 1989
Both theoretical and practical results show that the memory coherence problem can indeed be solved efficiently on a loosely coupled multiprocessor.
SPLASH: Stanford parallel applications for shared-memory
- Computer ScienceCARN
- 1992
This work presents the Stanford Parallel Applications for Shared-Memory (SPLASH), a set of parallel applications for use in the design and evaluation of shared-memory multiprocessing systems, and describes the applications currently in the suite in detail.
Implementation and performance of Munin
- Computer ScienceSOSP '91
- 1991
This work evaluates the implementation of Munin and describes the execution of two Munin programs that achieve performance within ten percent of message passing implementations of the same programs.
Evaluating the memory overhead required for COMA architectures
- Computer ScienceProceedings of 21 International Symposium on Computer Architecture
- 1994
Simulation data shows that the frequency of data reshuffling is sensitive to the allocation policy and associativity of the memory but is relatively unaffected by the block size chosen, and that data replication in the attraction memory is important for good performance, but most gains can be achieved through replicated in the processor caches.
DDM - A Cache-Only Memory Architecture
- Computer ScienceComputer
- 1992
The Data Diffusion Machine (DDM), a cache-only memory architecture (COMA) that relies on a hierarchical network structure, is described and simulated performance results are presented.
Tempest and Typhoon: user-level shared memory
- Computer ScienceProceedings of 21 International Symposium on Computer Architecture
- 1994
The authors simulated Typhoon on the Wisconsin Wind Tunnel and found that Stache running on Typhoon performs comparably to an all-hardware Dir/sub N/NB cache-coherence protocol for five shared-memory programs.
The Wisconsin Wind Tunnel: virtual prototyping of parallel computers
- Computer ScienceSIGMETRICS '93
- 1993
A new technique for evaluating cache coherent, shared-memory computers and the Wisconsin Wind Tunnel (WWT) is developed, which correctly interleaves target machine events and calculates target program execution time.