REQ-YOLO: A Resource-Aware, Efficient Quantization Framework for Object Detection on FPGAs

@article{Ding2019REQYOLOAR,
  title={REQ-YOLO: A Resource-Aware, Efficient Quantization Framework for Object Detection on FPGAs},
  author={Caiwen Ding and Shuo Wang and Ning Liu and Kaidi Xu and Yanzhi Wang and Yun Liang},
  journal={Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays},
  year={2019}
}
  • Caiwen Ding, Shuo Wang, +3 authors Yun Liang
  • Published 2019
  • Computer Science
  • Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
Deep neural networks (DNNs), as the basis of object detection, will play a key role in the development of future autonomous systems with full autonomy. [...] Key Method We adopt the block-circulant matrix method and propose a heterogeneous weight quantization using Alternative Direction Method of Multipliers (ADMM), an e ective optimization technique for general, non-convex optimization problems.Expand
Sparse-YOLO: Hardware/Software Co-Design of an FPGA Accelerator for YOLOv2
TLDR
A novel sparse convolution algorithm is extended to the YOLOv2 framework, and a resource-efficient FPGA accelerator architecture based on asynchronously executed parallel convolution cores is developed, which successfully save the computational workload of the Yolov2 algorithm by 7 times. Expand
Mixed Precision Quantization for ReRAM-based DNN Inference Accelerators
TLDR
A mixed precision quantization scheme for ReRAM-based DNN inference accelerators where weightquantization, input quantization, and partial sum quantization are jointly applied for each DNN layer and an automated quantization flow powered by deep reinforcement learning to search for the best quantization configuration in the large design space is proposed. Expand
Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs
TLDR
A novel architecture for implementing fast algorithms on FPGAs, which effectively pipeline the Winograd/FFT processing element (PE) engine and initiate multiple PEs through parallelization, and proposes an analytical model to predict the resource usage and the performance. Expand
Binary Complex Neural Network Acceleration on FPGA
  • Hongwu Peng, Shanglin Zhou, +9 authors Caiwen Ding
  • Computer Science
  • ArXiv
  • 2021
TLDR
A structural pruning based accelerator of BCNN, which is able to provide more than 5000 frames/s inference throughput on edge devices and a novel 2D convolution operation accelerator for the binary complex neural network. Expand
Object Detection on FPGAs and GPUs by Using Accelerated Deep Learning
TLDR
Object detection and recognition procedures were performed using the ZYNQ XC7Z020 development board including both the ARM processor and the FPGA and real-time object recognition has been made with the Movidius USB-GPU externally plugged into the FFPA. Expand
AddNet: Deep Neural Networks Using FPGA-Optimized Multipliers
TLDR
It is demonstrated that reconfigurable constant coefficient multipliers (RCCMs) offer a better alternative for saving the silicon area than utilizing low-precision arithmetic for deep-learning applications on field-programmable gate arrays (FPGAs). Expand
Layer-Specific Optimization for Mixed Data Flow With Mixed Precision in FPGA Design for CNN-Based Object Detectors
TLDR
A layer-specific design that employs different organizations that are optimized for the different layers, which significantly outperforms the previous works in terms of both throughput, off-chip access, and on-chip memory requirement. Expand
ReBoc: Accelerating Block-Circulant Neural Networks in ReRAM
TLDR
This work designs an accelerator named ReBoc for accelerating block-circulant DNNs in ReRAM to reap the benefits of light-weight models and efficient in-situ processing simultaneously, and proposes a novel mapping scheme which utilizes Horizontal Weight Slicing and Intra-Crossbar Weight Duplication to map block- circulants DNN models onto ReRAM crossbars with significant improved crossbar utilization. Expand
Optimizing energy efficiency of CNN-based object detection with dynamic voltage and frequency scaling
TLDR
This work develops a DVFS framework on FPGAs, applies the DVFS to SkyNet, a state-of-the-art neural network targeting on object detection, and analyzes the impact of DVFS on CNNs in terms of performance, power, energy efficiency and accuracy. Expand
A Systematic Assessment of Embedded Neural Networks for Object Detection
TLDR
A comprehensive and fair comparison of the best-in-class Convolution Neural Networks (CNNs) for real-time embedded systems, detailing the effort made to achieve an unbiased characterization on cutting-edge system-on-chips. Expand
...
1
2
3
4
...

References

SHOWING 1-10 OF 69 REFERENCES
From model to FPGA: Software-hardware co-design for efficient neural network acceleration
TLDR
A complete design flow to achieve both fast deployment and high energy efficiency for accelerating neural networks on FPGA is proposed and two architecture designs for CNN and DNN/RNN will be introduced together with the compilation environment. Expand
FINN: A Framework for Fast, Scalable Binarized Neural Network Inference
TLDR
FINN, a framework for building fast and flexible FPGA accelerators using a flexible heterogeneous streaming architecture that implements fully connected, convolutional and pooling layers, with per-layer compute resources being tailored to user-provided throughput requirements is presented. Expand
SpWA: An Efficient Sparse Winograd Convolutional Neural Networks Accelerator on FPGAs
  • L. Lu, Yun Liang
  • Computer Science
  • 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)
  • 2018
TLDR
This paper introduces sparse Winograd convolution accelerator (SpWA) combining these two orthogonal approaches on FPGAs, and designs an efficient architecture to implement SpWA using line buffer design and Compress-Sparse-Column format-based processing element. Expand
Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs
TLDR
This paper proposes a novel architecture for implementing Winograd algorithm on FPGAs and proposes an analytical model to predict the resource usage and reason about the performance, and uses the model to guide a fast design space exploration. Expand
Going Deeper with Embedded FPGA Platform for Convolutional Neural Network
TLDR
This paper presents an in-depth analysis of state-of-the-art CNN models and shows that Convolutional layers are computational-centric and Fully-Connected layers are memory-centric, and proposes a CNN accelerator design on embedded FPGA for Image-Net large-scale image classification. Expand
EIE: Efficient Inference Engine on Compressed Deep Neural Network
  • Song Han, Xingyu Liu, +4 authors W. Dally
  • Computer Science
  • 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)
  • 2016
TLDR
An energy efficient inference engine (EIE) that performs inference on this compressed network model and accelerates the resulting sparse matrix-vector multiplication with weight sharing and is 189x and 13x faster when compared to CPU and GPU implementations of the same DNN without compression. Expand
Fixed Point Quantization of Deep Convolutional Networks
TLDR
This paper proposes a quantizer design for fixed point implementation of DCNs, formulate and solve an optimization problem to identify optimal fixed point bit-width allocation across DCN layers, and demonstrates that fine-tuning can further enhance the accuracy of fixed point DCNs beyond that of the original floating point model. Expand
A Lightweight YOLOv2: A Binarized CNN with A Parallel Support Vector Regression for an FPGA
TLDR
This paper implements a pipelined based architecture for the lightweight YOLOv2, which consists of the binarized CNN for a feature extraction and the parallel support vector regression (SVR) for both a classification and a localization. Expand
Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs
TLDR
This paper implements CNN on an FPGA using a systolic array architecture, which can achieve high clock frequency under high resource utilization, and provides an analytical model for performance and resource utilization and develops an automatic design space exploration framework. Expand
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
TLDR
This work implements a CNN accelerator on a VC707 FPGA board and compares it to previous approaches, achieving a peak performance of 61.62 GFLOPS under 100MHz working frequency, which outperform previous approaches significantly. Expand
...
1
2
3
4
5
...