AutoDSE: Enabling Software Programmers Design Efficient FPGA Accelerators

  title={AutoDSE: Enabling Software Programmers Design Efficient FPGA Accelerators},
  author={Atefeh Sohrabizadeh and Cody Hao Yu and Min Gao and Jason Cong},
  journal={The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays},
Adopting FPGA as an accelerator in datacenters is becoming mainstream for customized computing, but the fact that FPGAs are hard to program creates a steep learning curve for software programmers. Even with the help of high-level synthesis (HLS), accelerator designers still must manually perform code reconstruction and cumbersome parameter tuning to achieve the optimal performance. While many learning models have been leveraged by existing work to automate the design of efficient accelerators… 

OverGen: Improving FPGA Usability through Domain-specific Overlay Generation

The essential idea is to develop a hardware generation framework targeting a highly-customizable overlay, so that the abstraction gap can be lowered by tuning the design instance to applications of interest, and is highly competitive with fixed-function HLS-based designs, even though the generated designs are programmable with fast reconfiguration.

Bring Your Own Codegen to Deep Learning Compiler

This paper proposes an open source framework that enables users to only concentrate on the development of their proprietary code generation tools by reusing as many as possible components in the existing deep learning compilers.

Enabling Automated FPGA Accelerator Optimization Using Graph Neural Networks

This paper proposes to solve the problem of developing a high-performance architecture by modeling the HLS tool with a graph neural network (GNN) that is trained to be used for a wide range of applications that is able to estimate the quality of design in milliseconds with high accuracy.

AutoSA: A Polyhedral Compiler for High-Performance Systolic Arrays on FPGA

This work presents AutoSA, an end-to-end compilation framework for generating systolic arrays on FPGA, based on the polyhedral framework, and further incorporates a set of optimizations on different dimensions to boost performance.

Improving GNN-based accelerator design automation with meta learning

Experiments show the MAML-enhanced model outperforms a simple baseline based on fine tuning in terms of both offline evaluation on hold-out test sets and online evaluation for DSE speedup results.

Automated Accelerator Optimization Aided by Graph Neural Networks

The experimental results demonstrate that by employing the GNN-based model, the HLS tool is able to estimate the quality of design in milliseconds with high accuracy which results in an average speedup of 55x for optimizing the design compared to the previous state-of-the-art work.

High-Level Synthesis Hardware Design for FPGA-Based Accelerators: Models, Methodologies, and Frameworks

A survey of models, methodologies, and frameworks proposed for metric estimation, FPGA-based DSE, and power consumption estimation on FPGa/SoC, and the integration of existing models and frameworks in diverse research areas are presented.

Challenges Designing for FPGAs Using High-Level Synthesis

Several cases where an anticipated performance improvement was either not realized or resulted in decreased performance are reported, pointing to a number of improvements that are needed for HLS tool flows, including a strong need for performance modeling that can reliably guide the compilation optimization process.

ScaleHLS: Scalable High-Level Synthesis through MLIR

The proposed ScaleHLS1, a next-generation HLS compilation flow on top of a multi-level compiler infrastructure called MLIR, for the first time, is able to represent and optimize HLS designs at multiple levels of abstraction and provides an HLS-dedicated transform and analysis library to solve the optimization problems at the suitable representation levels.

PYXIS: An Open-Source Performance Dataset Of Sparse Accelerators

  • Linghao SongYuze ChiJ. Cong
  • Computer Science
    ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2022
PYXIS, a performance dataset for customized accelerators on sparse data, collects accelerator designs and real execution performance statistics and can be a benefit to researchers in the fields of accelerator, architecture, performance, algorithm and many related topics.



Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture

The composable, parallel and pipeline (CPP) microarchitecture is proposed as an accelerator design template to substantially reduce the design space and the AutoAccel framework is developed to automate the entire accelerator generation process.

Automatic Generation of Efficient Accelerators for Reconfigurable Hardware

A hybrid area estimation technique which uses template-level models and design-level artificial neural networks to account for effects from hardware place-and-route tools, including routing overheads, register and block RAM duplication, and LUT packing is described.

S2FA: An Accelerator Automation Framework for Heterogeneous Computing in Datacenters

S2FA (Spark-to-FPGA-Accelerator), an automation framework that generates FPGA accelerator designs from Apache Spark programs written in Scala, bridges the semantic gap between object-oriented languages and HLS C while achieving high performance using learning-based design space exploration.

Predictable accelerator design with time-sensitive affine types

A type system that restricts HLS to programs that can predictably compile to hardware accelerators is proposed and implemented in Dahlia, a language that compiles to HLS C++, and it is shown that it can reduce the size of HLS parameter spaces while accepting Pareto-optimal designs.

Design Space exploration of FPGA-based accelerators with multi-level parallelism

A rapid estimation framework, MPSeeker, to evaluate performance/area metrics of various accelerator options for an application at an early design phase and can rapidly (in minutes) explore the complex design space and accurately estimate performance/ area of various design points to identify the near-optimal combination of parallelism options.

Generating Configurable Hardware from Parallel Patterns

This paper presents a general representation of tiled parallel patterns, and provides rules for automatically tiling patterns and generating metapipelines, and demonstrates experimentally that these optimizations result in speedups up to 39.4Γ— on a set of benchmarks from the data analytics domain.

AutoSA: A Polyhedral Compiler for High-Performance Systolic Arrays on FPGA

This work presents AutoSA, an end-to-end compilation framework for generating systolic arrays on FPGA, based on the polyhedral framework, and further incorporates a set of optimizations on different dimensions to boost performance.

A Parallel Bandit-Based Approach for Autotuning FPGA Compilation

This work studies the effectiveness of applying the multi-armed bandit (MAB) technique to automatically tune the options for a complete FPGA compilation flow from RTL to bitstream, including RTL/logic synthesis, technology mapping, placement, and routing.

Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL

This work creates a stencil accelerator using Intel FPGA SDK for OpenCL that achieves high performance without having such restrictions by combining spatial and temporal blocking to avoid input size restrictions, and employs multiple FPGAs-specific optimizations to tackle issues arisen from the added design complexity.

HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Reconfigurable Computing

Experimental results show that HeteroCL allows programmers to explore the design space efficiently in both performance and accuracy by combining different types of hardware customization and targeting spatial architectures, while keeping the algorithm code intact.