Kill the Program Counter: Reconstructing Program Behavior in the Processor Cache Hierarchy
Data prefetching, advanced cache replacement policy, and memory access scheduling are incorporated in modern processors. Typically, each technique holds recently accessed locations independently and controls the memory subsystem based on the prediction of future memory access. Unfortunately, these specific optimizations often increase the implementation cost, decrease the system performance, and reduce scalability of the processor chip. In this paper, we propose Unified Memory Optimizing (UMO) architecture to resolve these problems. The UMO architecture is a control architecture for the memory subsystem and takes a unified approach to data prefetching, cache management, and memory access scheduling. On this architecture, we propose a Map-based Unified Memory Subsystem Controller (MUMSC) that is composed of DRAM-Aware prefetching, Prefetch-Aware Cache Line Promotion, and lightweight memory controllers. MUMSC is implemented as the per-core resource to predict future memory access from the per-core memory access history. MUMSC realizes a scalable and high performance memory subsystem with a reasonable hardware cost. We evaluate MUMSC using a multi-core simulator with multi-programmed workloads of SPEC CPU2006. The results of the simulation show that the system throughput using MUMSC outperforms a combination of state-of-the-art enhancement techniques by 11.5% without increasing the hardware costs and the complexity of the design of the shared resources.