An argument for simple COMA

@article{Saulsbury1995AnAF,
  title={An argument for simple COMA},
  author={Ashley Saulsbury and Tim Wilkinson and John B. Carter and Anders Landin},
  journal={Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture},
  year={1995},
  pages={276-285}
}
We present design details and some initial performance results of a novel scalable shared memory multiprocessor architecture. This architecture features the automatic data migration and replication capabilities of cache-only memory architecture (COMA) machines, without the accompanying hardware complexity. A software layer manages cache space allocation at a page-granularity-similarly to distributed virtual shared memory (DVSM) systems, leaving simpler hardware to maintain shared memory… 

Figures and Tables from this paper

Cache-Only Memory Architectures
TLDR
The authors explain the functionality, architecture, performance, and complexity of COMA systems, which compare COMA to traditional nonuniform memory access (NUMA) systems, and describe proposed improvements in NUMA systems that target the same performance obstacles as COMA.
Research Feature Cache-Only Memory Architectures
TLDR
The functionality, architecture, performance, and complexity of COMA systems are explained, different COMA designs are outlined, COMA to traditional cache-coherent non-uniform memory access (NUMA) systems are compared, and proposed improvements in NUMA systems that target the same performance obstacles as COMA are described.
The Impact of Memory Organization in Hybrid DSM
TLDR
This study compares the design issues and performance consequences for adopting in hybrid DSM four memory organizations inspired from existing architectures: CC-NUMA, RCNUma, S-COMA, and COMA.
PRISM: an integrated architecture for scalable shared memory
TLDR
Adaptive, run-time policies that take advantage of PRISM's ability to dynamically configure shared memory pages with different behaviors significantly outperform pure CC-NUMA or Simple-COMA configurations and are usually within 10% of optimal performance.
Reducing Remote Con ict Misses in Shared-Memory Multiprocessors: NUMA with Remote Cache and COMA
  • Computer Science
  • 2007
TLDR
To compare the performance of the two organizations for the same amount of total memory, a model of data sharing is introduced that uses three data sharing patterns: replication, read-mostly migration, and read-write migration.
2 CC-NUMA and COMA-F Architectures
Distributed shared memory multiprocessors with cache coherent non-uniform memory architectures (CC-NUMA) have become popular in the memory design of multiprocessors in recent years. The shared data
Evaluating the Memory Performance of a ccNUMA System
TLDR
This work presents a detailed memory performance analysis of a particular ccNUMA system (the SGI Origin 2000) and presents a new memory profiling tool and a new set of microbenchmark codes, called snbench, which make such a fine-grained memoryperformance analysis possible.
964 COMA : AN OPPORTUNITY FOR BUILDING FAULT-TOLERANT SCALABLE SHARED MEMORY MULTIPROCESSORS
TLDR
The class of Cache Only Memory Architectures (COMA) are good candidates for building fault-tolerant SSMMs and a backward error recovery strategy can be implemented without signiicant hardware modiication to previously proposed COMA by exploiting their standard replication mechanisms and extending the coherence protocol to transparently manage recovery data.
Simple COMA Shared Memory and the RS / 6000 SP White Paper
TLDR
The Simple COMA shared memory architecture and the potential implementation of this architecture on the IBM RS/6000 SP parallel computer are explained.
A Dual Address Space Architecture: Implementation and Evaluation
TLDR
This dissertation proposes changes to a hardware-based DSM architecture that allow users to use two address spaces to gain the scalability of distributed architectures while retaining the benefits of the shared address space architecture.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 26 REFERENCES
Simple COMA node implementations
TLDR
The authors introduce the idea of a simple COMA architecture, a hybrid with hardware support only for the functionality frequently used, and because of its simplicity it should be quick and cheap to develop and engineer.
Experimental comparison of memory management policies for NUMA multiprocessors
TLDR
The results show that there are memory management policies implemented in the system that can improve the performance of programs written using the simpler uniform memory access (UMA) programming model, and there appears to be no single policy that can be considered the best over a set of test applications.
The directory-based cache coherence protocol for the DASH multiprocessor
TLDR
The design of the DASH coherence protocol is presented and how it addresses the issues of correctness, performance and protocol complexity are discussed and compared to the IEEE Scalable Coherent Interface protocol.
Memory coherence in shared virtual memory systems
TLDR
Both theoretical and practical results show that the memory coherence problem can indeed be solved efficiently on a loosely coupled multiprocessor.
SPLASH: Stanford parallel applications for shared-memory
TLDR
This work presents the Stanford Parallel Applications for Shared-Memory (SPLASH), a set of parallel applications for use in the design and evaluation of shared-memory multiprocessing systems, and describes the applications currently in the suite in detail.
Implementation and performance of Munin
TLDR
This work evaluates the implementation of Munin and describes the execution of two Munin programs that achieve performance within ten percent of message passing implementations of the same programs.
Evaluating the memory overhead required for COMA architectures
  • T. Joe, J. Hennessy
  • Computer Science
    Proceedings of 21 International Symposium on Computer Architecture
  • 1994
TLDR
Simulation data shows that the frequency of data reshuffling is sensitive to the allocation policy and associativity of the memory but is relatively unaffected by the block size chosen, and that data replication in the attraction memory is important for good performance, but most gains can be achieved through replicated in the processor caches.
DDM - A Cache-Only Memory Architecture
TLDR
The Data Diffusion Machine (DDM), a cache-only memory architecture (COMA) that relies on a hierarchical network structure, is described and simulated performance results are presented.
Tempest and Typhoon: user-level shared memory
TLDR
The authors simulated Typhoon on the Wisconsin Wind Tunnel and found that Stache running on Typhoon performs comparably to an all-hardware Dir/sub N/NB cache-coherence protocol for five shared-memory programs.
The Wisconsin Wind Tunnel: virtual prototyping of parallel computers
TLDR
A new technique for evaluating cache coherent, shared-memory computers and the Wisconsin Wind Tunnel (WWT) is developed, which correctly interleaves target machine events and calculates target program execution time.
...
1
2
3
...