ARENA: Asynchronous Reconfigurable Accelerator Ring to Enable Data-Centric Parallel Computing

  title={ARENA: Asynchronous Reconfigurable Accelerator Ring to Enable Data-Centric Parallel Computing},
  author={Cheng Tan and Chenhao Xie and Tong Geng and Andr{\'e}s M{\'a}rquez and Antonino Tumeo and Kevin J. Barker and Ang Li},
  journal={IEEE Transactions on Parallel and Distributed Systems},
  • Cheng TanChenhao Xie Ang Li
  • Published 10 November 2020
  • Computer Science
  • IEEE Transactions on Parallel and Distributed Systems
The next generation HPC and data centers are likely to be reconfigurable and data-centric due to the trend of hardware specialization and the emergence of data-driven applications. In this article, we propose ARENA – an asynchronous reconfigurable accelerator ring architecture as a potential scenario on how the future HPC and data centers will be like. Despite using the coarse-grained reconfigurable arrays (CGRAs) as the substrate platform, our key contribution is not only the CGRA-cluster… 

A Data-Centric Accelerator for High-Performance Hypergraph Processing

A novel data-centric Load-Trigger-Reduce (LTR) execution model is proposed to exploit fully the locality in hypergraph processing and an LTR-driven hypergraph accelerator is architected, XuLin, which features with an adaptive data loading mechanism to minimize the loading cost via chunk merging at runtime.

SO(DA)2: End-to-end Generation of Specialized Reconfigurable Architectures (Invited Talk)

The Software Defined Architectures for Data Analytics (SO(DA) 2) toolchain is discussed, an end-to-end hardware/software codesign framework to generate custom reconfigurable architectures for data analytics applications and partial dynamic reconfiguration as key element of the system design is considered.



Plasticine: A reconfigurable architecture for parallel patterns

This work designs Plasticine, a new spatially reconfigurable architecture designed to efficiently execute applications composed of parallel patterns that provide an improvement of up to 76.9× in performance-per-Watt over a conventional FPGA over a wide range of dense and sparse applications.

Coarse-Grained Reconfigurable Array Architectures

The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual fine-tuning of source code.

Integrating Reconfigurable Hardware-Based Grid for High Performance Computing

Experimental results show that the proposed architecture offers encouraging advantages for deployment of high performance distributed applications simplifying development process.

X10: an object-oriented approach to non-uniform cluster computing

A modern object-oriented programming language, X10, is designed for high performance, high productivity programming of NUCC systems and an overview of the X10 programming model and language, experience with the reference implementation, and results from some initial productivity comparisons between the X 10 and Java™ languages are presented.

Polymorphic Pipeline Array: A flexible multicore accelerator with virtualized execution for mobile multimedia applications

This work focuses on the design of a programmable, low-power accelerator for multimedia algorithms referred to as a Polymorphic Pipeline Array, or PPA, which is designed with flexibility and programmability as first-order requirements to enable the hardware to be dynamically customizable to the application.

RC3E: Reconfigurable Accelerators in Data Centres and Their Provision by Adapted Service Models

This paper presents the development of the FPGA cloud architecture, beginning with realistic use cases and adapted service models for the use of reconfigurable hardware accelerators in a cloud context, and developed a special resource management system (RC3E) which serves as a hypervisor for the virtualized hardware.

GASNet-EX: A High-Performance, Portable Communication Library for Exascale

GASNet-EX is a portable, open-source, high-performance communication library designed to efficiently support the networking requirements of PGAS runtime systems and other alternative models in future exascale machines.

CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit

Timing experiments demonstrate that for a 4-way out-of-order superscalar processor Chimaera results in average performance improvements of 21%, assuming a very aggressive core processor design (most pessimistic RFU latency model) and communication overheads from and to the RFU.

A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services

The authors deployed the reconfigurable fabric in a bed of 1,632 servers and FPGAs in a production datacenter and successfully used it to accelerate the ranking portion of the Bing Web search engine by nearly a factor of two.

Enabling Flexible Network FPGA Clusters in a Heterogeneous Cloud Data Center

This framework generates the OpenStack calls needed to reserve the compute devices, creates the network connections (and retrieve MAC addresses), generate the bitstreams, programs the devices, and configure the devices with the appropriate MAC addresses, creating a ready-to-use network device that can interact with any other network device in the data center.