Larry Rudolph

Learn More
Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-memory parallel programs. Unfortunately, typical implementations of busy-waiting tend to produce large amounts of memory and interconnect contention, introducing performance bottlenecks that become markedly more pronounced as applications scale. We argue that(More)
This paper proposes dynamic cache partitioning amongst simultaneously executing processes/threads. We present a general partitioning scheme that can be applied to set-associative caches. Since memory reference characteristics of processes/threads can change over time, our method collects the cache miss characteristics of processes/threads at run-time. Also,(More)
We present the design for the NYU Ultracomputer, a shared-memory MIMD parallel machine composed of thousands of autonomous processing elements. This machine uses an enhanced message switching network with the geometry of an Omega-network to approximate the ideal behavior of Schwartz's paracomputer model of computation and to implement efficiently the(More)
The memory hierarchy in modern computing systems is typically time-shared and space-shared amongst multiple processes and threads, some of which execute simultaneously. Memory contention can signi cantly degrade the performance of running processes. Cache hit counters found in modern microprocessor provide a limited picture as to the memory needs of(More)
BACKGROUND Job characteristics may constitute a barrier to return-to-work (RTW) after compensated disabling low back pain (LBP). This study examines the impact of psychosocial job factors on time to RTW separately during the acute and subacute/chronic disability phases. METHODS This is a retrospective cohort study of 433 LBP workers' compensation(More)
Parallel job scheduling has gained increasing recognition in recent years as a distinct area of study. However , there is concern about the divergence of theory and practice in the eld. We review theoretical research in this area, and recommendations based on recent results. This is contrasted with a proposal for standard interfaces among the components of(More)
A periodic sorting network consists of a sequence of identical blocks. In this paper, the periodic balanced sorting network, which consists of log <italic>n</italic> blocks, is introduced. Each block, called a balanced merging block, merges elements on the even input lines with those on the odd input lines. The periodic balanced sorting network sorts(More)
In a snoopy cache multiprocessor system, each processor has a cache in which it stores blocks of data. Each cache is connected to a bus used to communicate with the other caches and with main memory. Each cache monitors the activity on the bus and in its own processor and decides which blocks of data to keep and which to discard. For several of the proposed(More)