This paper discusses the design concepts of a lock mechanism for a Parallel Inference Machine (the PIM/c prototype) and investigates the performance of the mechanism in detail. Lock operations are extremely frequent on the PIM; however, lock contention rarely occurs during normal memory usage. For this reason, the lock mechanism is designed so as to minimize the lock overhead time in the case of no contention. This is done by using an invalidation lock mechanism, which utilizes the exclusive state of the snooping cache and in which the locked address is not broadcast. Experimental results demonstrate the benefits of the lock mechanism in regions of few lock contentions. They also confirm that, in most cases, the lock mechanism works well on the PIM. However, the mechanism is also found to cause performance degradation when a locked address is accessed by multiple processing elements (PEs) in a tightly-coupled multi-processor (TCMP). This is because shared data such as the flags for inter-PE communication, which are shared by all the PEs, may be accessed by multiple PEs at the same time, thus generating heavy contention. This paper also shows that combining a register-based broadcasting facility with the proposed lock mechanism can solve the above problem.
Unfortunately, ACM prohibits us from displaying non-influential references for this paper.
To see the full reference list, please visit http://dl.acm.org/citation.cfm?id=143384.