ECI: a Customizable Cache Coherency Stack for Hybrid FPGA-CPU Architectures

@article{Ramdas2022ECIAC,
  title={ECI: a Customizable Cache Coherency Stack for Hybrid FPGA-CPU Architectures},
  author={Abishek Ramdas and Michael J. Giardino and Runbin Shi and Adam Turowski and David A. Cock and Gustavo Alonso and Timothy Roscoe},
  journal={ArXiv},
  year={2022},
  volume={abs/2208.07124}
}
Unlike other accelerators, FPGAs are capable of supporting cache coherency, thereby turning them into a more powerful architectural option than just a peripheral accelerator. However, most existing deployments of FPGAs are either non-cache coherent or support only an asymmetric design where cache coherency is controlled from the CPU. Taking advantage of a recently released two socket CPU-FPGA architecture, in this paper we describe A Customizable Caching Interface (ACCI), a flexible… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 48 REFERENCES

Enzian: an open, general, CPU/FPGA platform for systems software research

It is shown that a research group can design and build a more general, open, and affordable hardware platform for hybrid systems research, and Enzian is capable of duplicating the functionality of existing CPU/FPGA systems with comparable performance but in an open, flexible system.

NoC-Based Support of Heterogeneous Cache-Coherence Models for Accelerators

This work proposes an extension of a standard directory-based cache-coherence protocol and presents its design as part of a scalable memory hierarchy implemented over a NoC, and designed a many-accelerator SoC architecture that can support three main cache- coherence models for accelerators: non-coherent, last-level-cache- coherent, and fully-co coherent.

CAPI: A Coherent Accelerator Processor Interface

The Coherent Accelerator Processor Interface (CAPI) is enabled, which enables attaching an accelerator as a coherent CPU peer over the I/O physical interface, and greatly increases the opportunities for acceleration due to the much shorter software path length required to enable its use compared to a traditional I/W model.

Comparing cache architectures and coherency protocols on x86-64 multicore SMP systems

A set of sophisticated benchmarks for latency and bandwidth measurements to arbitrary locations in the memory subsystem are presented and the coherency state of cache lines are considered to analyze the cache co herency protocols and their performance impact.

Energy and performance exploration of accelerator coherency port using Xilinx ZYNQ

This is the first work which represents detailed practical comparisons on the speed and energy efficiency of various processor-accelerator memory sharing techniques in a configurable heterogeneous platform.

Accelerating Pattern Matching Queries in Hybrid CPU-FPGA Architectures

This work integrates the hardware accelerator into MonetDB, a main-memory column store, and demonstrates a significant improvement in response time and throughput, and provides a novel and efficient implementation of two commonly used SQL operators for strings.

Exploring Portability and Performance of OpenCL FPGA Kernels on Intel HARPv2

This work targets the second iteration of the HARPv2 platform using HLS through porting of OpenCL kernels originally written for FPGAs connected via a PCIe bus, and explores the portability of kernels through a hardware design space search, and empirically shows the benefits of using the shared virtual memory (SVM) abstraction over explicit reads and writes.

Project PBerry: FPGA Acceleration for Remote Memory

This approach uses emerging cache-coherent FPGAs to expose cache coherence events to the operating system and enables other use cases, such as live virtual machine migration, unified virtual memory, security and code analysis, which open up many promising research directions.

CoNDA: Efficient Cache Coherence Support for Near-Data Accelerators

CoNDA is proposed, a coherence mechanism that lets an NDA optimistically execute an Nda kernel, under the assumption that the NDA has all necessary coherence permissions, and allows CoNDA to gather information on the memory accesses performed by the Nda and by the rest of the system.

IBM POWER9 opens up a new era of acceleration enablement: OpenCAPI

Open Coherent Accelerator Processor Interface (OpenCAPI) is a new industry-standard device interface that enables the development of host-agnostic devices that can coherently connect to any host