Xar-trek: run-time execution migration among FPGAs and heterogeneous-ISA CPUs

  title={Xar-trek: run-time execution migration among FPGAs and heterogeneous-ISA CPUs},
  author={Edson Lemos Horta and Ho-Ren Chuang and Naarayanan Rao VSathish and Cesar J. Philippidis and Antonio Barbalace and Pierre Olivier and Binoy Ravindran},
  journal={Proceedings of the 22nd International Middleware Conference},
Datacenter servers are increasingly heterogeneous: from x86 host CPUs, to ARM or RISC-V CPUs in NICs/SSDs, to FPGAs. Previous works have demonstrated that migrating application execution at run-time across heterogeneous-ISA CPUs can yield significant performance and energy gains, with relatively little programmer effort. However, FPGAs have often been overlooked in that context: hardware acceleration using FPGAs involves statically implementing select application functions, which prohibits… 


Analysis and Modeling of Collaborative Execution Strategies for Heterogeneous CPU-FPGA Architectures
This paper compares various collaborative techniques (namely, data partitioning and task partitioning), and evaluates the tradeoffs between them, and finds that different partitioning strategies pose different tradeoffs, but they generally outperform execution on conventional CPU-FPGA systems where no collaborative execution strategies are used.
Flick: Fast and Lightweight ISA-Crossing Call for Heterogeneous-ISA Environments
Experiments with microbenchmarks and a BFS application show that Flick requires only minor changes to the existing OS and software, and incurs only 18ps round trip overhead for migrating a thread through PCIe, which is at least 23x faster than prior work.
Breaking the Boundaries in Heterogeneous-ISA Datacenters
This work presents a new multi-ISA binary architecture and heterogeneous-OS containers for facilitating efficient migration of natively-compiled applications and demonstrates energy savings of up to 66% for a workload running on an ARM and an x86 server interconnected by a high-speed network.
AIRA: A Framework for Flexible Compute Kernel Execution in Heterogeneous Platforms
This work introduces AIRA, a compiler and runtime for flexible execution of applications in CPU-GPU platforms, and demonstrates up to a 3.78x speedup in benchmarks from Rodinia and Parboil, run with various workloads on a server-class platform.
A Study of Pointer-Chasing Performance on Shared-Memory Processor-FPGA Systems
The paper explores the trade-offs over a wide range of implementation options available on shared-memory processor-FPGA architectures, including using tightly-coupled processor assistance, and shows that the FPGA fabric is least efficient when traversing a single list with non-sequential node layout and a small payload size.
Dynamic application reconfiguration on heterogeneous hardware
Through TornadoVM, a virtual machine capable of reconfiguring applications, at runtime, for hardware acceleration based on the currently available hardware resources, this paper introduces a new level of compilation in which applications can benefit from heterogeneous hardware.
Feature-Aware Task Scheduling on CPU-FPGA Heterogeneous Platforms
  • Peilun Du, Zichang Sun, Haitao Zhang, Huadong Ma
  • Computer Science
    2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)
  • 2019
This paper conducts the in-depth analysis of workload features in the CPU-FPGA platforms, and presents a task speedup learning method based on the static and dynamic features extracted from the resource capabilities, code structure and running task description.
HEXO: Offloading HPC Compute-Intensive Workloads on Low-Cost, Low-Power Embedded Systems
It is shown that sharing long-running, compute-intensive datacenter HPC workloads between a server machine and one or a few connected embedded boards of negligible cost and power consumption can bring significant benefits in terms of consolidation.
Popcorn: bridging the programmability gap in heterogeneous-ISA platforms
A new software architecture is proposed that is composed of an operating system and a compiler framework to run ordinary shared memory applications, written for homogeneous machines, on OS-capable heterogeneous-ISA machines, and is shown to be up to 6.2 times faster than an offloading programming model.
Lynx: A SmartNIC-driven Accelerator-centric Architecture for Network Servers
Lynx is proposed, an accelerator-centric network server architecture that offloads the server data and control planes to the SmartNIC, and enables direct networking from accelerators via a lightweight hardware-friendly I/O mechanism, which enables the design of hardware-accelerated network servers that run without CPU involvement.