Optimization of FPGA-based CNN Accelerators Using Metaheuristics

  title={Optimization of FPGA-based CNN Accelerators Using Metaheuristics},
  author={Sadiq M. Sait and Aiman H. El-Maleh and Mohammad Altakrouri and Ahmad Shawahna},
—In recent years, convolutional neural networks (CNNs) have demonstrated their ability to solve problems in many fields and with accuracy that was not possible before. However, this comes with extensive computational requirements, which made general central processing units (CPUs) unable to deliver the desired real-time performance. At the same time, field-programmable gate arrays (FPGAs) have seen a surge in interest for accelerating CNN inference. This is due to their ability to create custom… 



Maximizing CNN accelerator efficiency through resource partitioning

This work presents a new CNN accelerator paradigm and an accompanying automated design methodology that partitions the available FPGA resources into multiple processors, each of which is tailored for a different subset of the CNN convolutional layers.

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks

This work presents a systematic design space exploration methodology to maximize the throughput of an OpenCL-based FPGA accelerator for a given CNN model, considering the FPGAs resource constraints such as on-chip memory, registers, computational resources and external memory bandwidth.

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks

This work implements a CNN accelerator on a VC707 FPGA board and compares it to previous approaches, achieving a peak performance of 61.62 GFLOPS under 100MHz working frequency, which outperform previous approaches significantly.

FPGA-Based Accelerators of Deep Learning Networks for Learning and Classification: A Review

The techniques investigated in this paper represent the recent trends in the FPGA-based accelerators of deep learning networks and are expected to direct the future advances on efficient hardware accelerators and to be useful for deep learning researchers.

FPGA-Based Convolutional Neural Network Accelerator with Resource-Optimized Approximate Multiply-Accumulate Unit

This paper implemented the LeNet-5 CNN architecture, which performs classification of handwritten digits using the MNIST handwritten digit dataset, and implemented an FPGA-based CNN accelerator using multiple approximate accumulation units based on a fixed-point data type.

FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks

This paper proposes aflexible dataflow architecture (FlexFlow) that can leverage the complementary effects among feature map, neuron, and synapse parallelism to mitigate the mismatch between the parallel types supported by computing engine and the dominant parallel types of CNN workloads.

Optimized Memory Allocation and Power Minimization for FPGA-Based Image Processing

This paper proposes methods for generating memory architectures, from both Hardware Description Languages and High Level Synthesis designs, which minimize memory usage and power consumption, and demonstrates how the proposed partitioning algorithms can outperform traditional strategies.

A Survey on Efficient Convolutional Neural Networks and Hardware Acceleration

To improve the efficiency of deep learning research, this review focuses on three aspects: quantized/binarized models, optimized architectures, and resource-constrained systems.

Polyhedral-based data reuse optimization for configurable computing

This work uses the power and expressiveness of the polyhedral compilation model to develop a multi-objective optimization system for off-chip communications management and implements a fast design space exploration technique for effective optimization of program performance using the Xilinx high-level synthesis tool.

FxP-QNet: A Post-Training Quantizer for the Design of Mixed Low-Precision DNNs with Dynamic Fixed-Point Representation

A novel framework referred to as the Fixed-Point Quantizer of deep neural Networks (FxP-QNet) is proposed that flexibly designs a mixed low-precision DNN for integer-arithmetic-only deployment and empirically demonstrates the effectiveness of FxP -QNet in achieving the accuracy-compression trade-off without the need for training.