Toolflows for Mapping Convolutional Neural Networks on FPGAs

  title={Toolflows for Mapping Convolutional Neural Networks on FPGAs},
  author={Stylianos I. Venieris and Alexandros Kouris and Christos-Savvas Bouganis},
  journal={ACM Computing Surveys (CSUR)},
  pages={1 - 39}
In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing deep-learning ecosystem to provide a tunable balance… 

Figures and Tables from this paper

fpgaConvNet: Mapping Regular and Irregular Convolutional Neural Networks on FPGAs

Since neural networks renaissance, convolutional neural networks (ConvNets) have demonstrated a state-of-the-art performance in several emerging artificial intelligence tasks. The deployment of

f-CNNx: A Toolflow for Mapping Multiple Convolutional Neural Networks on FPGAs

The predictive power of Convolutional Neural Networks (CNNs) has been an integral factor for emerging latency-sensitive applications, such as autonomous drones and vehicles. Such systems employ

Semi-Streaming Architecture: A New Design Paradigm for CNN Implementation on FPGAs.

A set of five layerspecialized configurable processing engines for implementing 8-bit quantized MobilenevV2 CNN model is presented, chained to partially preserve data streaming and tuned individually to efficiently process specific types of layers.

mNet2FPGA: A Design Flow for Mapping a Fixed-Point CNN to Zynq SoC FPGA

A CNN core architecture called mNet2FPGA is proposed that places a trained CNN on a SoC SoC FPGA and the hardware architecture is based on the advanced extensible interface (AXI) stream processing with simultaneous bidirectional transfers between RAM and the CNN core.

Accelerating Convolutional Neural Networks in FPGA-based SoCs using a Soft-Core GPU

This work investigates using a soft-core Graphics Processing Unit (GPU), implemented in the FPGA, to execute different Convolutional Neural Networks (CNNs), and shows the potential of the collaborative execution of CNNs using these two platforms together.

FPGA Implementation of MobileNetV2 CNN Model Using Semi-Streaming Architecture for Low Power Inference Applications

  • Nazariy K. ShaydyukE. John
  • Computer Science
    2020 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)
  • 2020
A set of five layer-specialized configurable processing engines for implementing 8-bit quantized MobilenevV2 CNN model is presented, which is chained to partially preserve data streaming and tuned individually to efficiently process specific types of layers.

MulMapper: Towards an Automated FPGA-Based CNN Processor Generator Based on a Dynamic Design Space Exploration

It has been verified that early-stage MulMapper can lead to synthesis of resource-optimized CNN processor hardware IP that can be used for many regular CNN variants.

unzipFPGA: Enhancing FPGA-based CNN Engines with On-the-Fly Weights Generation

This work investigates the implications in terms of CNN engine design for a class of models that introduce a pre-convolution stage to decompress the weights at run time and presents unzipFPGA, a framework to train on-the-fly models and traverse the design space to select the highest performing CNN engine configuration.

Dissecting Convolutional Neural Networks for Efficient Implementation on Constrained Platforms

It is shown that modifying specific network design parameters such as filter size, the number of fully connected layers, and subsampling techniques have a considerable impact on the overall performance and efficiency, enabling informed trade-offs and optimization.

An FPGA Overlay for CNN Inference with Fine-grained Flexible Parallelism

An FPGA overlay for efficient processing of CNNs that can be scaled based on the available compute and memory resources of the FPGAs is proposed and studied by using it to process AlexNet, VGG16, YOLO, MobileNet, and ResNet-50 CNNs targeting a Virtex7 and a bigger Ultrascale+VU9P FPG as.



Scalable and modularized RTL compilation of Convolutional Neural Networks onto FPGA

This work quantitatively analyzes the complier's design strategy to optimize the throughput of a given CNN model with the FPGA resource constraints, and demonstrates the promise of the automatic compiler solution for modularized and scalable hardware acceleration of deep learning.

fpgaConvNet: Automated Mapping of Convolutional Neural Networks on FPGAs (Abstract Only)

In recent years, Convolutional Neural Networks (ConvNets) have become the state-of-the-art in several Artificial Intelligence tasks. Across the range of applications, the performance needs vary

Caffeinated FPGAs: FPGA framework For Convolutional Neural Networks

This work presents a modified version of the popular CNN framework Caffe, with FPGA support, which allows for classification using CNN models and specialized FPN implementations with the flexibility of reprogramming the device when necessary, seamless memory transactions between host and device, simple-to-use test benches, and the ability to create pipelined layer implementations.

Optimizing Frequency Domain Implementation of CNNs on FPGAs

An algorithmarchitecture co-design methodology based on the computational characteristics of CNN models and the features of underlying hardware to realize high performance designs to speed up various CNN models.

Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs

The design of a BNN accelerator is presented that is synthesized from C++ to FPGA-targeted Verilog and outperforms existing FPGAs-based CNN accelerators in GOPS as well as energy and resource efficiency.

Tactics to Directly Map CNN Graphs on Embedded FPGAs

The feasibility of the so called direct hardware mapping (DHM) is demonstrated and several tactics are discussed to make DHM usable in practice, as a proof of concept, the HADDOC2 open source tool, that automatically transforms a CNN description into a synthesizable hardware description with platform-independent DHM.

F-CNN: An FPGA-based framework for training Convolutional Neural Networks

  • Wenlai ZhaoH. Fu Guangwen Yang
  • Computer Science
    2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP)
  • 2016
The proposed framework is based on reconfiguring a streaming datapath at runtime to cover the training cycle for the various layers in a CNN, and indicates that the proposed module design targeting Maxeler technology can achieve a performance of 62.06 GFLOPS for 32-bit floating-point arithmetic, outperforming existing accelerators.

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks

This work presents a systematic design space exploration methodology to maximize the throughput of an OpenCL-based FPGA accelerator for a given CNN model, considering the FPGAs resource constraints such as on-chip memory, registers, computational resources and external memory bandwidth.

Automatic code generation of convolutional neural networks in FPGA implementation

This paper proposes parallel structures to exploit the inherent parallelism and efficient computation units to perform operations in convolutional and fully-connected layers and proposes an automatic generator to generate Verilog HDL source code automatically according to high-level hardware description language.

An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks

This work presents an RTL-level CNN compiler that automatically generates customized FPGA hardware for the inference tasks of various CNNs, in order to enable high-level fast prototyping of CNNs from software to FPGAs and still keep the benefits of low-level hardware optimization.