HybridDNN: A Framework for High-Performance Hybrid DNN Accelerator Design and Implementation

  title={HybridDNN: A Framework for High-Performance Hybrid DNN Accelerator Design and Implementation},
  author={Hanchen Ye and Xiaofan Zhang and Zhize Huang and Gengsheng Chen and Deming Chen},
  journal={2020 57th ACM/IEEE Design Automation Conference (DAC)},
To speedup Deep Neural Networks (DNN) accelerator design and enable effective implementation, we propose HybridDNN, a framework for building high-performance hybrid DNN accelerators and delivering FPGA-based hardware implementations. Novel techniques include a highly flexible and scalable architecture with a hybrid Spatial/Winograd convolution (CONV) Processing Engine (PE), a comprehensive design space exploration tool, and a complete design flow to fully support accelerator design and… 

Figures and Tables from this paper

Accelerator Design and Exploration for Deformable Convolution Networks

This work proposes an efficient design space exploration (DSE) heuristic to generate optimized hybrid accelerators on a given device that achieves up to 24.7 times higher throughput compared with the CPU (GPU) baselines, and up to 4.5 times improvement in effective resource utilization compared with state-of-the-art FPGA accelerators for DCN.

DNNExplorer: A Framework for Modeling and Exploring a Novel Paradigm of FPGA-based DNN Accelerator

A novel FPGA-based DNN accelerator design paradigm and its automation tool, called DNNExplorer, are proposed to enable fast exploration of various accelerator designs under the proposed paradigm and deliver optimized accelerator architectures for existing and emerging DNN networks.

Effective Algorithm-Accelerator Co-design for AI Solutions on Edge Devices

The motivations and challenges for the Algorithm/Accelerator co-design problem are discussed, and several effective solutions are provided, including the first simultaneous DNN/FPGA co- design method and a differentiable and efficient DNN and accelerator co-search method.

Being-ahead: Benchmarking and Exploring Accelerators for Hardware-Efficient AI Deployment

An automation tool for benchmarking customized DNN hardware accelerators and exploring novel accelerator designs with improved performance and efficiency and a design space exploration engine to generate optimized accelerators by considering targeted AI workloads and available hardware resources are leveraged.

Resource and Performance Estimation for CNN Models using Machine Learning

Various Machine Learning (ML) models are presented to estimate the Logic Utilization and Computation Time from the Python design descriptions of CNNs in negligible time before running HLS synthesis.

Robust Estimation of FPGA Resources and Performance from CNN Models

A machine learning-based two-stage estimator for assisting in designing an FPGA-based CNN accelerator and a proposed estimation methodology that enables a software engineer to obtain rapid and accurate estimates of the final implementation Quality of Results without executing FPGAs design flows.

Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration

Gemmini is an open-source, full-stack DNN accelerator generator that generates a wide design-space of efficient ASIC accelerators from a flexible architectural template, together with flexible programming stacks and full SoCs with shared resources that capture system-level effects.

ICCAD: G: Bridge the Hardware-Software Gap: Exploring End-to-End Design Flows for Building Efficient AI Systems

End-to-end design flows are explored to bridge the software-hardware gap and deliver efficient AI systems for various applications and hardware setups.

End to End Framework for CNN Acceleration on FPGAs with Dynamic Algorithm Mapping

This work develops an API for automatically extracting the ONNX graph from high-level programming libraries such as Pytorch and Tensorflow and defines an HLS template of the accelerator supporting three parallel algorithm choices for convolution operations to increase the productivity of generating hardware.

DeepBurning-SEG: Generating DNN Accelerators of Segment-Grained Pipeline Architecture

This work proposes a novel class of design solution for DNN acceleration, segment-grained pipeline architecture (SPA), and introduces an automated design framework, AutoSeg, that includes a parameterized SPA accelerator template and a co-design engine that will generate the efficient model segmentation solution and hardware pipeline design parameters for the acceleration workload.



DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs

DNNBuilder, an automatic design space exploration tool to generate optimized parallelism guidelines by considering external memory access bandwidth, data reuse behaviors, FPGA resource availability, and DNN complexity, is designed and demonstrated.

Design Flow of Accelerating Hybrid Extremely Low Bit-Width Neural Network in Embedded FPGA

This work proposes a design flow for accelerating the extremely low bit-width neural network (ELB-NN) in embedded FPGAs with hybrid quantization schemes, which facilitates the design space exploration and simplifies the tradeoff between network accuracy and computation efficiency.

FPGA/DNN Co-Design: An Efficient Design Methodology for 1oT Intelligence on the Edge

Results show that the proposed DNN model and accelerator outperform the state-of-the-art FPGA designs in all aspects including Intersection-over-Union (IoU) and energy efficiency.

Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs

This paper implements CNN on an FPGA using a systolic array architecture, which can achieve high clock frequency under high resource utilization, and provides an analytical model for performance and resource utilization and develops an automatic design space exploration framework.

DeepBurning: Automatic generation of FPGA-based learning accelerators for the Neural Network family

A design automation tool allowing the application developers to build from scratch learning accelerators that targets their specific NN models with custom configurations and optimized performance, and greatly simplifies the design flow of NN accelerators for the machine learning or AI application developers.

Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network

An analytical performance model is proposed and an in-depth analysis on the resource requirement of CNN classifier kernels and available resources on modern FPGAs are performed and a new kernel design is proposed to effectively address such bandwidth limitation and to provide an optimal balance between computation, on-chip, and off-chip memory access.

Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs

This paper proposes a novel architecture for implementing Winograd algorithm on FPGAs and proposes an analytical model to predict the resource usage and reason about the performance, and uses the model to guide a fast design space exploration.

TGPA: Tile-Grained Pipeline Architecture for Low Latency CNN Inference

The Tile-Grained Pipeline Architecture (TGPA) is proposed, a heterogeneous design which supports pipelining execution of multiple tiles within a single input image on multiple heterogeneous accelerators.

High-performance video content recognition with long-term recurrent convolutional network for FPGA

A design framework for DNNs is presented that uses highly configurable IPs for neural network layers together with a new design space exploration engine for Resource Allocation Management (REALM) to further improve the FPGA solution.

AutoDNNchip: An Automated DNN Chip Predictor and Builder for Both FPGAs and ASICs

The proposed AutoDNNchip is a DNN chip generator that can automatically produce both FPGA- and ASIC-based DNNChip implementation from DNNs developed by machine learning frameworks without humans in the loop and can achieve better performance than that of expert-crafted state-of-the-art FPGAs and ASICs.