Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster


Recently, FPGA-based CNN accelerators have demonstrated superior energy efficiency compared to high-performance devices like GPGPUs. However, due to the constrained on-chip resource and many other factors, single-board FPGA designs may have difficulties in achieving optimal energy efficiency. In this paper we present a deeply pipelined multi-FPGA architecture that expands the design space for optimal performance and energy efficiency. A dynamic programming algorithm is proposed to map the CNN computing layers efficiently to different FPGA boards. To demonstrate the potential of the architecture, we built a prototype system with seven FPGA boards connected with high-speed serial links. The experimental results on AlexNet and VGG-16 show that the prototype can achieve up to 21x and 2x energy efficiency compared to optimized multi-core CPU and GPU implementations, respectively.

DOI: 10.1145/2934583.2934644

8 Figures and Tables

Cite this paper

@inproceedings{Zhang2016EnergyEfficientCI, title={Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster}, author={Chen Zhang and Di Wu and Jiayu Sun and Guangyu Sun and Guojie Luo and Jason Cong}, booktitle={ISLPED}, year={2016} }