Zoran Radovic

Learn More
This paper identifies node affinity as an important property for scalable general-purpose locks. Nonuniform communication architectures (NUCAs), for example CC-NUMAs built from a few large nodes or from chip multipro-cessors (CMPs), have a lower penalty for reading data from a neighbor's cache than from a remote cache. Lock implementations that encourages(More)
The advances in semiconductor technology have set the shared-memory server trend towards processors with multiple cores per die and multiple threads per core. We believe that this technology shift forces a reevaluation of how to interconnect multiple such chips to form larger systems.This paper argues that by adding support for <i>coherence traps</i> in(More)
Scalable parallel computers are often nonuniform communication architectures (NUCAs), where the access time to other processor's caches vary with their physical location. Still, few attempts of exploring cache-to-cache communication locality have been made. This paper introduces a new kind of synchronization primitives (lock-unlock) that favor neighboring(More)
Software-implementations of shared memory are still far behind the performance of hardware-based shared memory and are not viable options for most fine-grain shared-memory applications. The major source for their inefficiency comes from the cost of interrupt-based asynchronous protocol processing, not from the actual network latency. As the raw hardware(More)
Fine-grained software-based distributed shared memory (SW-DSM) systems typically maintain coherence with in-line checking code at load and store operations to shared memory. The instrumentation overhead of this added checking code can be severe. This paper (1) shows that most of the instrumentation overhead in the fine-grained SW-DSM system DSZOOM is(More)
Fine-grained software-based distributed shared memory (SW-DSM) systems typically maintain coherence with in-line checking code at load and store operations to shared memory. The instrumentation overhead of this added checking code can be severe. This paper (1) shows that most of the instrumentation overhead in the fine-grained DSZOOM SW-DSM system is store(More)
An efficient and robust instrumentation tool (or compiler support) is necessary for an efficient implementation of fine-grain software-based shared memory systems (SW-DSMs). The DSZOOM system, developed by the Uppsala Architecture Research Team (UART) at Uppsala University, is a sequentially consistent SW-DSM originally developed using EEL (Executable(More)