Full-stack Optimization for Accelerating CNNs with FPGA Validation
@article{McDanel2019FullstackOF, title={Full-stack Optimization for Accelerating CNNs with FPGA Validation}, author={Bradley McDanel and Sai Qian Zhang and H. T. Kung and Xin Dong}, journal={ArXiv}, year={2019}, volume={abs/1905.00462} }
We present a full-stack optimization framework for accelerating inference of CNNs (Convolutional Neural Networks) and validate the approach with field-programmable gate arrays (FPGA) implementations. By jointly optimizing CNN models, computing architectures, and hardware implementations, our full-stack approach achieves unprecedented performance in the trade-off space characterized by inference latency, energy efficiency, hardware utilization and inference accuracy. As a validation vehicle, we…
Figures and Tables from this paper
One Citation
Toward Full-Stack Acceleration of Deep Convolutional Neural Networks on FPGAs
- Computer ScienceIEEE Transactions on Neural Networks and Learning Systems
- 2022
This article introduces a highly customized streaming hardware architecture that focuses on improving the compute efficiency for streaming applications by providing full-stack acceleration of CNNs on FPGAs and demonstrates a high performance, which outperforms the state-of-the-art FPGA accelerators.
References
SHOWING 1-10 OF 52 REFERENCES
Maximizing CNN accelerator efficiency through resource partitioning
- Computer Science2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)
- 2017
This work presents a new CNN accelerator paradigm and an accompanying automated design methodology that partitions the available FPGA resources into multiple processors, each of which is tailored for a different subset of the CNN convolutional layers.
Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks
- Computer ScienceFPGA
- 2017
This work systematically explore the trade-offs of hardware cost by searching the design variable configurations, and proposes a specific dataflow of hardware CNN acceleration to minimize the memory access and data movement while maximizing the resource utilization to achieve high performance.
Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs
- Computer Science2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC)
- 2017
This paper proposes a fusion architecture that can fuse multiple layers naturally in CNNs, reusing the intermediate data, and designs an optimal algorithm to determine the fusion and algorithm strategy for each layer.
Going Deeper with Embedded FPGA Platform for Convolutional Neural Network
- Computer ScienceFPGA
- 2016
This paper presents an in-depth analysis of state-of-the-art CNN models and shows that Convolutional layers are computational-centric and Fully-Connected layers are memory-centric, and proposes a CNN accelerator design on embedded FPGA for Image-Net large-scale image classification.
Design Flow of Accelerating Hybrid Extremely Low Bit-Width Neural Network in Embedded FPGA
- Computer Science2018 28th International Conference on Field Programmable Logic and Applications (FPL)
- 2018
This work proposes a design flow for accelerating the extremely low bit-width neural network (ELB-NN) in embedded FPGAs with hybrid quantization schemes, which facilitates the design space exploration and simplifies the tradeoff between network accuracy and computation efficiency.
Fused-layer CNN accelerators
- Computer Science2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
- 2016
This work finds that a previously unexplored dimension exists in the design space of CNN accelerators that focuses on the dataflow across convolutional layers, and is able to fuse the processing of multiple CNN layers by modifying the order in which the input data are brought on chip, enabling caching of intermediate data between the evaluation of adjacent CNN layers.
ShiDianNao: Shifting vision processing closer to the sensor
- Computer Science2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)
- 2015
This paper proposes an accelerator which is 60x more energy efficient than the previous state-of-the-art neural network accelerator, designed down to the layout at 65 nm, with a modest footprint and consuming only 320 mW, but still about 30x faster than high-end GPUs.
Quantizing deep convolutional networks for efficient inference: A whitepaper
- Computer ScienceArXiv
- 2018
An overview of techniques for quantizing convolutional neural networks for inference with integer weights and activations is presented and it is recommended that per-channel quantization of weights and per-layer quantized of activations be the preferred quantization scheme for hardware acceleration and kernel optimization.
Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization
- Computer ScienceASPLOS
- 2019
This paper describes a novel approach of packing sparse convolutional neural networks into a denser format for efficient implementations using systolic arrays and demonstrates that in mitigating data privacy concerns the retraining can be accomplished with only fractions of the original dataset.
Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm
- Computer ScienceECCV
- 2018
A novel model, dubbed Bi-Real net, which connects the real activations (after the 1-bit convolution and/or BatchNorm layer, before the sign function) to activations of the consecutive block, through an identity shortcut is proposed, which achieves up to 10% higher top-1 accuracy with more memory saving and lower computational cost.