Optimizing Memory Efficiency for Deep Convolutional Neural Networks on GPUs
@article{Li2016OptimizingME, title={Optimizing Memory Efficiency for Deep Convolutional Neural Networks on GPUs}, author={C. Li and Yi Yang and Min Feng and S. Chakradhar and Huiyang Zhou}, journal={SC16: International Conference for High Performance Computing, Networking, Storage and Analysis}, year={2016}, pages={633-644} }
Leveraging large data sets, deep Convolutional Neural Networks (CNNs) achieve state-of-the-art recognition accuracy. Due to the substantial compute and memory operations, however, they require significant execution time. The massive parallel computing capability of GPUs make them as one of the ideal platforms to accelerate CNNs and a number of GPU-based CNN libraries have been developed. While existing works mainly focus on the computational efficiency of CNNs, the memory efficiency of CNNs… Expand
Figures, Tables, and Topics from this paper
45 Citations
Accelerating Deep Learning Inference with Cross-Layer Data Reuse on GPUs
- Computer Science
- Euro-Par
- 2020
- 1
- PDF
DeLTA: GPU Performance Model for Deep Learning Applications with In-Depth Memory System Traffic Analysis
- Computer Science
- 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
- 2019
- 5
- PDF
TurboDL: Improving the CNN Training on GPU With Fine-Grained Multi-Streaming Scheduling
- Computer Science
- IEEE Transactions on Computers
- 2021
- PDF
Design of a Convolutional Neural Network Instruction Set Based on RISC-V and Its Microarchitecture Implementation
- Computer Science
- ICA3PP
- 2020
Enabling Efficient Fast Convolution Algorithms on GPUs via MegaKernels
- Computer Science
- IEEE Transactions on Computers
- 2020
- 1
- PDF
A History-Based Auto-Tuning Framework for Fast and High-Performance DNN Design on GPU
- Computer Science
- 2020 57th ACM/IEEE Design Automation Conference (DAC)
- 2020
References
SHOWING 1-10 OF 43 REFERENCES
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
- Computer Science
- FPGA
- 2015
- 1,173
- PDF
On Vectorization of Deep Convolutional Neural Networks for Vision Tasks
- Computer Science
- AAAI
- 2015
- 57
- PDF
Fast Convolutional Nets With fbfft: A GPU Performance Evaluation
- Computer Science, Mathematics
- ICLR
- 2015
- 256
- Highly Influential
- PDF
DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning
- Computer Science
- ASPLOS
- 2014
- 1,022
- PDF
Fast Algorithms for Convolutional Neural Networks
- Computer Science
- 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
- 487
- Highly Influential
- PDF
A dynamically configurable coprocessor for convolutional neural networks
- Computer Science
- ISCA
- 2010
- 310
- PDF