• Publications
  • Influence
Memory access buffering in multiprocessors
It is shown that the logical problem of buffering is directly related to the problem of synchronization, and a simple model is presented to evaluate the performance improvement resulting from buffering.
Fixed and Adaptive Sequential Prefetching in Shared Memory Multiprocessors
This work proposes to adapt the number of pref etched blocks according to a dynamic measure of prefetching effectiveness, and shows significant reductions of the read penalty and of the overall execution time.
Parallel asynchronous algorithms for discrete data
Many problems in the area of symbolic computing can be solved by iterative algorithms. Implementations of these algorithms on multiprocessors can be synchronous or asynchronous. Asynchronous
Sequential Hardware Prefetching in Shared-Memory Multiprocessors
Simulations of this adaptive scheme show reductions of the number of read misses, the read penalty, and of the execution time by up to 78%, 58%, and 25% respectively.
SlackSim: a platform for parallel simulations of CMPs on CMPs
The concept of slack simulation is introduced where the Pthreads simulating different target cores do not synchronize after each simulated cycle, but rather they are given some slack, which is the difference in cycles between the simulated times of any two target cores.
CPPC: Correctable parity protected cache
A new reliable write-back cache called Correctable Parity Protected Cache (CPPC) is proposed which adds error correction capability to a parity-protected cache which provides a high level of reliability while its overheads are less than the overheads of SECDED and two-dimensional parity.
The effectiveness of SRAM network caches in clustered DSMs
  • Adrian Moga, M. Dubois
  • Computer Science
    Proceedings Fourth International Symposium on…
  • 31 January 1998
Small and fast SRAM network caches are explored as a means to reduce the remote stalls and capacity traffic of multiprocessor clusters and a novel and scalable method to control the page cache by integrating page relocation mechanisms into the network victim cache is proposed.
The detection and elimination of useless misses in multiprocessors
A new classification of misses in shared-memory multiprocessors based on interprocessor communication is introduced, which identifies the set of essential misses, i.e., the smallest set of misses necessary for correct execution.
Assisted Execution
Simulation results on several SPEC95 benchmarks show that sequential and stride prefetching implemented with nanothread technology performs just as well as ideal hardware prefetchers.
Virtual-address caches. Part 1: problems and solutions in uniprocessors
This survey introduces the problems and discusses solutions in the context of single-processor systems, to catalog all solutions, past and present, and to identify technology trends and attractive future approaches.