PAI: A Lightweight Mechanism for Single-Node Memory Recovery in DSM Servers

Abstract

Several recent studies identify the memory system as the most frequent source of hardware failures in commercial servers. Techniques to protect the memory system from failures must continue to service memory requests, despite hardware failures. Furthermore, to support existing OS's, the physical address space must be retained following reconfiguration. Existing techniques either suffer from a high performance overhead or require pervasive hardware changes to support transparent recovery. In this paper, we propose physical address indirection (PAI), a lightweight, hardware-based mechanism for memory system failure recovery. PAI provides a simple hardware mapping to transparently reconstruct affected data in alternate locations, while maintaining high performance and avoiding physical address changes. With full-system simulation of commercial and scientific workloads on a 16-node distributed shared memory server, we show that prior techniques have an average degraded mode performance loss of 14 % and 51 % for commercial and scientific workloads, respectively. Using PAI's data- swap reconstruction, the same workloads have 1 % and 32 % average performance losses.

DOI: 10.1109/PRDC.2007.37

Extracted Key Phrases

8 Figures and Tables

Cite this paper

@article{Kim2007PAIAL, title={PAI: A Lightweight Mechanism for Single-Node Memory Recovery in DSM Servers}, author={Jangwoo Kim and Jared C. Smolens and Babak Falsafi and James C. Hoe}, journal={13th Pacific Rim International Symposium on Dependable Computing (PRDC 2007)}, year={2007}, pages={298-305} }