An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks

  title={An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks},
  author={Yufei Ma and Yu Cao and Sarma B. K. Vrudhula and Jae-sun Seo},
  journal={2017 27th International Conference on Field Programmable Logic and Applications (FPL)},
  • Yufei Ma, Yu Cao, Jae-sun Seo
  • Published 1 September 2017
  • Computer Science
  • 2017 27th International Conference on Field Programmable Logic and Applications (FPL)
Convolutional neural networks (CNNs) are rapidly evolving and being applied to a broad range of applications. [] Key Method The proposed methodology is demonstrated with end-to-end FPGA implementations of various CNN algorithms (e.g. NiN, VGG-16, ResNet-50, and ResNet-152) on two standalone Intel FPGAs, Stratix V and Arria 10. The performance and overhead of the automated compilation are evaluated. The compiled FPGA accelerators exhibit superior performance compared to state-of-the-art automation-based works…
Automatic Compilation of Diverse CNNs Onto High-Performance FPGA Accelerators
This paper presents an register-transfer level (RTL)-level CNN compiler that automatically generates customized FPGA hardware for the inference tasks of various CNNs, in order to enable high-level fast prototyping of CNNs from software to FPGAs and still keep the benefits of low-level hardware optimization.
Automatic Compiler Based FPGA Accelerator for CNN Training
This work presents an automatic compiler based FPGA accelerator with 16-bit fixed-point precision for complete CNN training, including Forward Pass (FP), Backward Pass (BP) and Weight Update (WU), and implemented an optimized RTL library to perform training-specific tasks and developed an RTL compiler to automatically generate FPGa-synthesizable RTL based on user-defined constraints.
A Novel Software-Defined Convolutional Neural Networks Accelerator
A software-defined accelerator for convolutional neural network acceleration that can preserve high performance while maintaining flexibility and be implemented on the FPGA.
f-CNNx: A Toolflow for Mapping Multiple Convolutional Neural Networks on FPGAs
The predictive power of Convolutional Neural Networks (CNNs) has been an integral factor for emerging latency-sensitive applications, such as autonomous drones and vehicles. Such systems employ
Performance Modeling for CNN Inference Accelerators on FPGA
A performance model is described to estimate the performance and resource utilization of an FPGA implementation and it is shown that the performance bottleneck and design bound can be identified and the optimal design option can be explored early in the design phase.
Toolflows for Mapping Convolutional Neural Networks on FPGAs
A survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics, which include the supported applications, architectural choices, design space exploration methods, and achieved performance.
A survey of FPGA-based accelerators for convolutional neural networks
  • Sparsh Mittal
  • Computer Science
    Neural Computing and Applications
  • 2018
A survey of techniques for implementing and optimizing CNN algorithms on FPGA is presented and is expected to be useful for researchers in the area of artificial intelligence, hardware architecture and system design.
FPGA-Based Accelerators of Deep Learning Networks for Learning and Classification: A Review
The techniques investigated in this paper represent the recent trends in the FPGA-based accelerators of deep learning networks and are expected to direct the future advances on efficient hardware accelerators and to be useful for deep learning researchers.
Accelerating CNN inference on FPGAs: A Survey
The methods and tools investigated in this survey represent the recent trends in FPGA CNN inference accelerators and will fuel the future advances on effcient hardware deep learning.


Scalable and modularized RTL compilation of Convolutional Neural Networks onto FPGA
This work quantitatively analyzes the complier's design strategy to optimize the throughput of a given CNN model with the FPGA resource constraints, and demonstrates the promise of the automatic compiler solution for modularized and scalable hardware acceleration of deep learning.
DeepBurning: Automatic generation of FPGA-based learning accelerators for the Neural Network family
A design automation tool allowing the application developers to build from scratch learning accelerators that targets their specific NN models with custom configurations and optimized performance, and greatly simplifies the design flow of NN accelerators for the machine learning or AI application developers.
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
This work implements a CNN accelerator on a VC707 FPGA board and compares it to previous approaches, achieving a peak performance of 61.62 GFLOPS under 100MHz working frequency, which outperform previous approaches significantly.
From high-level deep neural models to FPGAs
DnnWeaver is devised, a framework that automatically generates a synthesizable accelerator for a given DNN, FPGA pair from a high-level specification in Caffe that best matches the needs of the DNN while providing high performance and efficiency gains for the target FPGAs.
fpgaConvNet: A Framework for Mapping Convolutional Neural Networks on FPGAs
Convolutional Neural Networks (ConvNets) are a powerful Deep Learning model, providing state-of-the-art accuracy to many emerging classification problems. However, ConvNet classification is a
Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks
This work systematically explore the trade-offs of hardware cost by searching the design variable configurations, and proposes a specific dataflow of hardware CNN acceleration to minimize the memory access and data movement while maximizing the resource utilization to achieve high performance.
C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization
This work has proposed a novel deep learning accelerator, which offers multiple types of data-level parallelism: inter-kernel, intra-kernel and hybrid, and can adaptively switch among the three types of parallelism and the corresponding data tiling schemes to dynamically match different networks or even different layers of a single network.
FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates
  • Yijin Guan, Hao Liang, J. Cong
  • Computer Science
    2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)
  • 2017
FP-DNN (Field Programmable DNN), an end-to-end framework that takes TensorFlow-described DNNs as input, and automatically generates the hardware implementations on FPGA boards with RTL-HLS hybrid templates, is proposed.
FINN: A Framework for Fast, Scalable Binarized Neural Network Inference
FINN, a framework for building fast and flexible FPGA accelerators using a flexible heterogeneous streaming architecture that implements fully connected, convolutional and pooling layers, with per-layer compute resources being tailored to user-provided throughput requirements is presented.
Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks
This paper design and implement Caffeine, a hardware/software co-designed library to efficiently accelerate the entire CNN on FPGAs with a key focus on the bandwidth optimization by the memory access reorganization not studied in prior work.