Remote memory in the age of fast networks

  title={Remote memory in the age of fast networks},
  author={Marcos K. Aguilera and Nadav Amit and Irina Calciu and Xavier Deguillard and Jayneel Gandhi and Pratap Subrahmanyam and Lalith Suresh and Kiran Tati and Rajesh Venkatasubramanian and Michael Yung Chung Wei},
  journal={Proceedings of the 2017 Symposium on Cloud Computing},
  • M. AguileraNadav Amit M. Wei
  • Published 24 September 2017
  • Computer Science
  • Proceedings of the 2017 Symposium on Cloud Computing
As the latency of the network approaches that of memory, it becomes increasingly attractive for applications to use remote memory---random-access memory at another computer that is accessed using the virtual memory subsystem. This is an old idea whose time has come, in the age of fast networks. To work effectively, remote memory must address many technical challenges. In this paper, we enumerate these challenges, discuss their feasibility, explain how some of them are addressed by recent work… 

Figures from this paper

Can far memory improve job throughput?

It is found that while far memory is not a panacea, for memory-intensive workloads it can provide performance improvements on the order of 10% or more even without changing the total amount of memory available.

More Exploration to Composable Infrastructure: The Application and Analysis of Composable Memory

This is the first design that provides the most significant memory extension without any software modification and the large-scale performance evaluations comparing to other state-of-the-art works about network-based remote memory.

Systems for Memory Disaggregation: Challenges & Opportunities

This report looks at some of these recent memory disaggregation systems and study the important factors that guide their design, such as the interface through which thememory is exposed to the application, their runtime design and relevant optimizations to retain the near-native application performance, various approaches they employ in managing cluster memory to maximize utilization, etc.

The Case for Physical Memory Pools: A Vision Paper

It is argued that creating physical memory pools is essential for cheaper and more efficient cloud computing infrastructures, and the research challenges to implement these structures are identified.

Thinking More about RDMA Memory Semantics

It is found that the performance can be improved in the RDMA network after considering the vector IO mechanism, the performance asymmetry between sequential and random access, IO consolidation, NUMA effects, as well as the atomic operations provided by the underlying hardware.

A Survey on the Challenges of Implementing Physical Memory Pools

This article identifies enabling technologies for physical memory pools such as OS design, distributed shared memory structures and virtualization with regards to their relevance and impact on eliminating memory limits, and discusses the challenges forPhysical memory pools which can be used by multiple servers.

Effectively Prefetching Remote Memory with Leap

Memory disaggregation over RDMA can improve the performance of memory-constrained applications by replacing disk swapping with remote memory accesses. However, state-of-the-art memory disaggregation

Hydra : Resilient and Highly Available Remote Memory

Hydra, a low-latency, low-overhead, and highly available resilience mechanism for remote memory, and CodingSets, a novel coding group placement algorithm for erasure-coded data, that provides load balancing while reducing the probability of data loss under correlated failures by an order of magnitude are presented.

Rethinking software runtimes for disaggregated memory

A new software runtime for disaggregated memory is implemented that improves average memory access time by 1.7-5X and reduces dirty data amplification by 2-10X, compared to state-of-the-art systems.

Cooperative Memory Expansion via OS Kernel Support for Networked Computing Systems

Cooperative memory expansion (COMEX), an OS kernel extension, establishes a stable pool of memory collectively across nodes in a cluster and enhances OS's memory subsystem for memory aggregation from connected machines by allowing process's page table to track remote memory page frames without programmer effort or modifications to application codes.



Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device

This paper presents the design and implementation of a high performance networking block device (HPBD) over InfiniBand fabric, which serves as a swap device of kernel virtual memory (VM) system for efficient page transfer to/from remote memory servers.

FaRM: Fast Remote Memory

We describe the design and implementation of FaRM, a new main memory distributed computing platform that exploits RDMA to improve both latency and throughput by an order of magnitude relative to

ThreadMarks: Shared Memory Computing on Networks of Workstations

This work discusses the experience with parallel computing on networks of workstations using the TreadMarks distributed shared memory system, which allows processes to assume a globally shared virtual memory even though they execute on nodes that do not physically share memory.

Adaptive Main Memory Compression

  • I. TuduceT. Gross
  • Computer Science
    USENIX Annual Technical Conference, General Track
  • 2005
A memory compression solution that adapts the allocation of real memory between uncompressed and compressed pages and also manages fragmentation without user involvement is described.

The Network RamDisk: Using remote memory on heterogeneous NOWs

This paper describes the design, implementation and evaluation of a Network RamDisk device that uses main memory of remote workstations as a faster‐than‐disk storage device and proposes various reliability policies, making the device tolerant to single workstation crashes.

System-level implications of disaggregated memory

A software-based prototype by extending the Xen hypervisor to emulate a disaggregated memory design wherein remote pages are swapped into local memory on-demand upon access is developed, showing that low-latency remote memory calls for a different regime of replacement policies than conventional disk paging.

Latency-Tolerant Software Distributed Shared Memory

Grappa enables users to program a cluster as if it were a single, large, non-uniform memory access (NUMA) machine, and addresses deficiencies of previous DSM systems by exploiting application parallelism, trading off latency for throughput.

Using One-Sided RDMA Reads to Build a Fast, CPU-Efficient Key-Value Store

This paper explores the design of a distributed in-memory key-value store called Pilaf that takes advantage of Remote Direct Memory Access to achieve high performance with low CPU overhead and introduces the notion of self-verifying data structures that can detect read-write races without client-server coordination.

Disaggregated memory for expansion and sharing in blade servers

It is demonstrated that memory disaggregation can provide substantial performance benefits (on average 10X) in memory constrained environments, while the sharing enabled by the solutions can improve performance-per-dollar by up to 57% when optimizing memory provisioning across multiple servers.

Efficient Memory Disaggregation with Infiniswap

The design and implementation of INFINISWAP is described, a remote memory paging system designed specifically for an RDMA network that increases the overall memory utilization of a cluster and works well at scale.