James R. Goodman

Learn More
Serialization of threads due to critical sections is a fundamental bottleneck to achieving high performance in multithreaded programs. Dynamically, such serialization may be unnecessary because these critical sections could have safely executed concurrently without locks. Current processors cannot fully exploit such parallelism because they do not have(More)
In 1987 we were working at the University of Wisconsin-Madison with Jim Smith, J. T. Hsieh, Koujuch Liou and Andrew Pleszkun on PIPE [4], an unorthodox 'decoupled access-execute processor.' The driving innovation of PIPE was the separation of instructions dealing with memory through a separate and independent instruction stream racing ahead, initiating load(More)
This paper makes the case that pin bandwidth will be a critical consideration for future microprocessors. We show that many of the techniques used to tolerate growing memory latencies do so at the expense of increased bandwidth requirements. Using a decomposition of execution time, we show that for modern processors that employ aggressive memory latency(More)
The primitives make use of synchronization bits (syncbits) to provide a simple mechanism for mutual exclusion. The proposed implementation of the primitives includes efEcient (Le. kxal) busy-waiting for syncbit& In addition, a hardware-supported mechanism for maintaining a first-come ih-st-serve queue of requests for a syncbit is proposed. This queueing(More)
The importance of reducing processor-memory bandwidth is recognized in two distinct situations: single board computer systems and microprocessors of the future. Cache memory is investigated as a way to reduce the memory-processor traffic. We show that traditional caches which depend heavily on spatial locality (look-ahead) for their performance are(More)
A multiprocessor cache memory system is described that supplies data to the processor based on virtual addresses, but maintains consistency in the main memory, both across caches and across virtual address spaces. Pages in the same or different address spaces may be mapped to share a single physical page. The same hardware is used for maintaining(More)
This paper proposes a set of efficient primitives for process synchronization in multiprocessors. The only assumptions made in developing the set of primitives are that hardware combining is not implemented in the inter-connect, and (in one case) that the interconnect supports broadcast. The primitives make use of synchronization bits (syncbits) to provide(More)