Low Precision Floating Point Arithmetic for High Performance FPGA-based CNN Acceleration
@article{Wu2020LowPF, title={Low Precision Floating Point Arithmetic for High Performance FPGA-based CNN Acceleration}, author={Chen Wu and Mingyu Wang and Xinyuan Chu and Kun Wang and Lei He}, journal={Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays}, year={2020} }
Low precision data representation is important to reduce storage size and memory access for convolutional neural networks (CNNs). Yet, existing methods have two major limitations: (1) requiring re-training to maintain accuracy for deep CNNs, and (2) needing 16-bit floating point or 8-bit fixed point for a good accuracy. In this paper, we propose a low precision (8-bit) floating point (LPFP) quantization method for FPGA-based acceleration to overcome the above limitations. Without any re…
Figures and Tables from this paper
3 Citations
MP-OPU: A Mixed Precision FPGA-based Overlay Processor for Convolutional Neural Networks
- Computer Science2021 31st International Conference on Field-Programmable Logic and Applications (FPL)
- 2021
This paper proposes a Mixed Precision FPGA-based Overlay Processor (MP-OPU) to fully leverage the advantages of mixed precision for both conventional and lightweight CNNs.
Reduced-Precision Acceleration of Radio-Astronomical Imaging on Reconfigurable Hardware
- Computer ScienceIEEE Access
- 2022
A reduced-precision implementation of the gridding component of the widely-used WSClean imaging application and proposes the first custom floating-point accelerator on a Xilinx Alveo U50 FPGA using High-Level Synthesis.
Efficient Design of Low Bitwidth Convolutional Neural Networks on FPGA with Optimized Dot Product Units
- Computer Science
- 2022
Designing hardware accelerators to run the inference of convolutional neural networks (CNN) is under intensive research. Several different architectures have been proposed along with…
References
SHOWING 1-10 OF 54 REFERENCES
High-Performance FPGA-Based CNN Accelerator With Block-Floating-Point Arithmetic
- Computer ScienceIEEE Transactions on Very Large Scale Integration (VLSI) Systems
- 2019
An optimized block-floating-point (BFP) arithmetic is adopted in the accelerator for efficient inference of deep neural networks in this paper, and improves the energy and hardware efficiency by three times.
Computation Error Analysis of Block Floating Point Arithmetic Oriented Convolution Neural Network Accelerator Design
- Computer ScienceAAAI
- 2018
The effects of word width definitions in BFP to the CNN performance without retraining are verified and the noise-to-signal ratio (NSR) upper bound is developed, which provides the promising guidance for BFP based CNN engine design.
Fixed Point Implementation of Tiny-Yolo-v2 using OpenCL on FPGA
- Computer Science
- 2018
This study proposes the fixed-point (16-bit) implementation of CNN-based object detection model: Tiny-Yolo-v2 on Cyclone V PCIe Development Kit FPGA board using High-Level-Synthesis (HLS) tool: OpenCL and achieves a peak performance of 21 GOPs under 100 MHz working frequency.
Exploration of Low Numeric Precision Deep Learning Inference Using Intel® FPGAs
- Computer Science2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)
- 2018
A hardware design for FPGAs that takes advantage of the bandwidth, memory, power, and computation savings of limited numerical precision data and insights into the trade-offs between throughput and accuracy for various networks are provided.
Scalable high-performance architecture for convolutional ternary neural networks on FPGA
- Computer Science2017 27th International Conference on Field Programmable Logic and Applications (FPL)
- 2017
This work presents a highly versatile FPGA friendly architecture for TNN in which it can vary both the number of bits of the input data and the level of parallelism at synthesis time, allowing to trade throughput for hardware resources and power consumption.
Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs
- Computer Science2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC)
- 2017
This paper implements CNN on an FPGA using a systolic array architecture, which can achieve high clock frequency under high resource utilization, and provides an analytical model for performance and resource utilization and develops an automatic design space exploration framework.
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
- Computer ScienceFPGA
- 2015
This work implements a CNN accelerator on a VC707 FPGA board and compares it to previous approaches, achieving a peak performance of 61.62 GFLOPS under 100MHz working frequency, which outperform previous approaches significantly.
Deep Convolutional Neural Network Inference with Floating-point Weights and Fixed-point Activations
- Computer ScienceArXiv
- 2017
It is shown that using floating-point numbers for weights is more efficient than fixed-point representation for the same bit-width and enables compact hardware multiply-and-accumulate (MAC) unit design.
Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA
- Computer ScienceIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
- 2018
This paper proposes Angel-Eye, a programmable and flexible CNN accelerator architecture, together with data quantization strategy and compilation tool, which achieves similar performance and delivers up to better energy efficiency than peer FPGA implementation on the same platform.
Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs
- Computer Science2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)
- 2017
This paper proposes a novel architecture for implementing Winograd algorithm on FPGAs and proposes an analytical model to predict the resource usage and reason about the performance, and uses the model to guide a fast design space exploration.