Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks


As convolution layers contribute most operations in convolutional neural network (CNN) algorithms, an effective convolution acceleration scheme significantly affects the efficiency and performance of a hardware CNN accelerator. Convolution in CNNs involves three-dimensional multiply and accumulate (MAC) operations with four levels of loops, which results in… (More)
DOI: 10.1145/3020078.3021736

12 Figures and Tables


