Scalable and modularized RTL compilation of Convolutional Neural Networks onto FPGA

@article{Ma2016ScalableAM,
  title={Scalable and modularized RTL compilation of Convolutional Neural Networks onto FPGA},
  author={Y. Ma and Naveen Suda and Yu Cao and Jae-sun Seo and S. Vrudhula},
  journal={2016 26th International Conference on Field Programmable Logic and Applications (FPL)},
  year={2016},
  pages={1-8}
}
  • Y. Ma, Naveen Suda, +2 authors S. Vrudhula
  • Published 2016
  • Computer Science
  • 2016 26th International Conference on Field Programmable Logic and Applications (FPL)
Despite its popularity, deploying Convolutional Neural Networks (CNNs) on a portable system is still challenging due to large data volume, intensive computation and frequent memory access. Although previous FPGA acceleration schemes generated by high-level synthesis tools (i.e., HLS, OpenCL) have allowed for fast design optimization, hardware inefficiency still exists when allocating FPGA resources to maximize parallelism and throughput. A direct hardware-level design (i.e., RTL) can improve… Expand
102 Citations
ALAMO: FPGA acceleration of deep learning algorithms with a modularized RTL compiler
  • 37
An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks
  • 70
Toolflows for Mapping Convolutional Neural Networks on FPGAs
  • 91
  • PDF
Fast generation of high throughput customized deep learning accelerators on FPGAs
  • 14
CNN2Gate: Toward Designing a General Framework for Implementation of Convolutional Neural Networks on FPGA
  • 3
  • Highly Influenced
  • PDF
Optimising Convolutional Neural Networks for Reconfigurable Acceleration
  • Highly Influenced
  • PDF
CaFPGA: An automatic generation model for CNN accelerator
  • 11
Acceleration of Deep Learning on FPGA
  • 2
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 27 REFERENCES
Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks
  • 349
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
  • 1,196
  • PDF
Going Deeper with Embedded FPGA Platform for Convolutional Neural Network
  • 735
  • Highly Influential
  • PDF
14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks
  • 257
  • PDF
DaDianNao: A Machine-Learning Supercomputer
  • Yunji Chen, Tao Luo, +8 authors O. Temam
  • Computer Science
  • 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture
  • 2014
  • 921
  • PDF
A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks
  • 227
  • PDF
Caffe: Convolutional Architecture for Fast Feature Embedding
  • 12,726
  • PDF
NeuFlow: A runtime reconfigurable dataflow processor for vision
  • 324
  • PDF
Hardware accelerated convolutional neural networks for synthetic vision systems
  • 214
  • Highly Influential
  • PDF
Deep Learning with Limited Numerical Precision
  • 1,235
  • PDF
...
1
2
3
...