• Corpus ID: 5120675

The Scalable Coherent Interface (SCI) - IEEE Communications Magazine

  title={The Scalable Coherent Interface (SCI) - IEEE Communications Magazine},
  author={David B. Gustavson},
There is rapidly increasing demand for very-high-performance networked communication for workstation clusters, distributed databases, multiprocessors, industrial data acquisition and control systems, shared access to distributed data, and so on. Higherbandwidth hardware using the traditional protocols is not sufficient. Even at 100 Mb/s, and certainly at 250 Mb/s, throughput for many applications is so limited by delays due to architecturally induced inefficiencies, such as software overheads… 
1 Citations

Figures from this paper

A Tagless Coherence Directory
Simulations of commercial and scientific workloads indicate that TL has no statistically significant impact on performance, and incurs only a 2.5% increase in bandwidth utilization, and Analytical modelling predicts that TL continues to scale well up to at least 1024 cores.


Memory Channel Network for PCI
MC implements a form of virtual shared memory that permits applications to completely bypass the operating system and perform cluster communication directly from the user level, and drops communication latency and overhead by up to three orders of magnitude.
The Alewife architecture is described and the novel hardware features of the machine including LimitLESS directories and the rapid context switching processor are concentrated on.
The Stanford FLASH multiprocessor
The architecture of FLASH and MAGIC is presented, and the base cache-coherence and message-passing protocols are discussed, and Latency and occupancy numbers, which are derived from the system-level simulator and the Verilog code, are given.
Using cache memory to reduce processor-memory traffic
It is demonstrated that a cache exploiting primarily temporal locality (look-behind) can indeed reduce traffic to memory greatly, and introduce an elegant solution to the cache coherency problem.
Virtual memory mapped network interface for the SHRIMP multicomputer
A low-latency, high-bandwidth, virtual memory-mapped network interface for the SHRIMP multicomputer project at Princeton University is described, demonstrating that the approach can reduce the message passing overhead to a few user-level instructions.
Protected, user-level DMA for the SHRIMP network interface
The UDMA mechanism uses existing virtual memory translation hardware to perform permission checking and address translation without kernel involvement to initiate DMA transfers of input/output data, with full protection, at a cost of only two user-level memory references.
Memory as a network abstraction
An approach to interprocesses communication over fast networks is described. It is based on the assumption that the essence of the problem was the network abstraction, that is, what model the
DDM - A Cache-Only Memory Architecture
The Data Diffusion Machine (DDM), a cache-only memory architecture (COMA) that relies on a hierarchical network structure, is described and simulated performance results are presented.
Transactional memory: architectural support for lock-free data structures
Simulation results show that transactional memory matches or outperforms the best known locking techniques for simple benchmarks, even in the absence of priority inversion, convoying, and deadlock.
Multiple reservations and the Oklahoma update
A multiple reservation approach that allows atomic updates of multiple shared variables and simplifies concurrent and nonblocking codes for managing shared data structures such as queues and linked