Architectural and Operating System Support for Virtual Memory

@inproceedings{Bhattacharjee2017ArchitecturalAO,
  title={Architectural and Operating System Support for Virtual Memory},
  author={Abhishek Bhattacharjee and Daniel Lustig},
  booktitle={Architectural and Operating System Support for Virtual Memory},
  year={2017}
}
This book provides computer engineers, academic researchers, new graduate students, and seasoned practitioners an end-to-end overview of virtual memory. We begin with a recap of foundational concepts and discuss not only state-of-the-art virtual memory hardware and software support available today, but also emerging research trends in this space. The span of topics covers processor microarchitecture, memory systems, operating system design, and memory allocation. We show how efficient virtual… 
Improving and complementing virtual memory using hardware techniques
TLDR
This thesis proposes a range of hardware mechanisms to improve TLB performance and security mechanisms complementary to those provided by virtual memory and proposes lowoverhead mechanisms achieve this.
CoPTA: Contiguous Pattern Speculating TLB Architecture
TLDR
This paper proposes CoPTA, a technique to speculate the memory address translation upon a TLB miss to hide the PTW latency and shows that the operating system has a tendency to map contiguous virtual memory pages to contiguous physical pages.
Rebooting Virtual Memory with Midgard
TLDR
This work proposes Midgard, an intermediate address space between the virtual and the physical address spaces, to mitigate address translation overheads without program-level changes, and shows that instead of amplifying addresstranslation overheads, memory hierarchies with large caches can reduce address Translation overheads.
Dancing in the Dark: Profiling for Tiered Memory
TLDR
This paper evaluates different methods of memory-access collection and proposes a hybrid tiered-memory approach that offers comprehensive visibility into TMA.
Compendia: reducing virtual-memory costs via selective densification
TLDR
It is argued that these radix trees in use today are actually too sparse for modern workloads, so many of the overheads are unnecessary and memory accesses per walk are reduced by 27%, or 56% for virtualised systems, without significant memory overhead.
A Resizable C++ Container using Virtual Memory
TLDR
This work presents a thread-safe no-copy resizable C++ container class that can be used to store shared data among threads of a program on a shared-memory system.
TransForm: Formally Specifying Transistency Models and Synthesizing Enhanced Litmus Tests
TLDR
The TransForm framework features an axiomatic vocabulary for formally specifying memory transistency models (MTMs) and includes a synthesis engine to support the automated generation of litmus tests enhanced with MTM features when supplied with a TransForm MTM specification.
MIND: In-Network Memory Management for Disaggregated Data Centers
TLDR
This work shows that emerging programmable network switches can enable an efficient shared memory abstraction for disaggregated architectures by placing memory management logic in the network fabric, and realizes these insights into MIND1, an in-network memory management unit for rack-scale disaggregation.
Scalable Distributed Last-Level TLBs Using Low-Latency Interconnects
TLDR
The approach, which is dubbed Nocstar (NOCs for scalable TLB architecture), combines the high hit rates of shared TLBs with low access times of private L2 TLBs, enabling significant system performance benefits.
The Cost of Application-Class Processing: Energy and Performance Analysis of a Linux-Ready 1.7-GHz 64-Bit RISC-V Core in 22-nm FDSOI Technology
TLDR
A thorough power, performance, and efficiency analysis of the RISC-V ISA targeting baseline “application class” functionality, i.e., supporting the Linux OS and its application environment based on the authors' open-source single-issue in-order implementation of the 64-bit ISA variant (RV64GC) called Ariane.
...
...

References

SHOWING 1-10 OF 105 REFERENCES
A look at several memory management units, TLB-refill mechanisms, and page table organizations
TLDR
Comparing several virtual memory designs, including combinations of hierarchical and inverted page tables on hardware-managed and software-managed translation lookaside buffers (TLBs), shows that systems are fairly sensitive to TLB size and that VM overhead is roughly twice what was thought.
Heterogeneous memory management for embedded systems
TLDR
A compiler strategy that automatically partitions the data among the memory units of software-exposed heterogeneous memory is presented, and it is shown that this strategy is optimal among all static partitions for global and stack data, and a good heuristic for heap data.
Virtual memory in contemporary microprocessors
Here, we consider the memory management designs of a sampling of six recent processors, focusing primarily on their architectural differences, and hint at optimizations that someone designing or
COATCheck: Verifying Memory Ordering at the Hardware-OS Interface
TLDR
The term transistency model is introduced to describe the superset of consistency which captures all translation-aware sets of ordering rules, and can efficiently analyze interesting and important memory ordering scenarios for modern, high-performance, out-of-order processors.
Observations and opportunities in architecting shared virtual memory for heterogeneous systems
TLDR
This work analyzes, using real-system measurements, shared virtual memory across the CPU and an integrated GPU, and presents a detailed measurement study of a commercially available integrated APU that illustrates these effects and motivates future research opportunities.
SpecTLB: A mechanism for speculative address translation
TLDR
This work presents a novel device, the SpecTLB, that exploits the predictable behavior of reservation-based physical memory allocators to interpolate address translations and effectively enables the use of small pages to achieve fine-grained allocation and protection, while avoiding the associated latency penalties ofsmall pages.
Redundant Memory Mappings for fast access to large memories
TLDR
Redundant Memory Mappings (RMM) is proposed, which leverage ranges of pages and provides an efficient, alternative representation of many virtual-to-physical mappings, reducing the overhead of virtual memory to less than 1% on average.
Efficient virtual memory for big memory servers
TLDR
This work proposes mapping part of a process's linear virtual address space with a direct segment, while page mapping the rest of thevirtual address space to remove the TLB miss overhead for big-memory workloads.
Design Tradeoffs For Software-managed Tlbs
TLDR
This work explores software-managed TLB design tradeoffs and their interaction with a range of monolithic and microkernel operating systems and explores TLB performance for benchmarks running on a MIPS R2000-based workstation.
A Primer on Memory Consistency and Cache Coherence
TLDR
This primer is to provide readers with a basic understanding of consistency and coherence, and presents both highlevel concepts as well as specific, concrete examples from real-world systems.
...
...