Optimizing Memory Efficiency for Deep Convolutional Neural Networks on GPUs

@article{Li2016OptimizingME,
  title={Optimizing Memory Efficiency for Deep Convolutional Neural Networks on GPUs},
  author={C. Li and Yi Yang and Min Feng and S. Chakradhar and Huiyang Zhou},
  journal={SC16: International Conference for High Performance Computing, Networking, Storage and Analysis},
  year={2016},
  pages={633-644}
}
  • C. Li, Yi Yang, +2 authors Huiyang Zhou
  • Published 2016
  • Computer Science
  • SC16: International Conference for High Performance Computing, Networking, Storage and Analysis
Leveraging large data sets, deep Convolutional Neural Networks (CNNs) achieve state-of-the-art recognition accuracy. Due to the substantial compute and memory operations, however, they require significant execution time. The massive parallel computing capability of GPUs make them as one of the ideal platforms to accelerate CNNs and a number of GPU-based CNN libraries have been developed. While existing works mainly focus on the computational efficiency of CNNs, the memory efficiency of CNNs… Expand
45 Citations
Accelerating Deep Learning Inference with Cross-Layer Data Reuse on GPUs
  • 1
  • PDF
DeLTA: GPU Performance Model for Deep Learning Applications with In-Depth Memory System Traffic Analysis
  • 5
  • PDF
TurboDL: Improving the CNN Training on GPU With Fine-Grained Multi-Streaming Scheduling
  • PDF
Enabling Efficient Fast Convolution Algorithms on GPUs via MegaKernels
  • 1
  • PDF
MEC: Memory-efficient Convolution for Deep Neural Network
  • 38
  • PDF
A survey of techniques for optimizing deep learning on GPUs
  • 26
A History-Based Auto-Tuning Framework for Fast and High-Performance DNN Design on GPU
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 43 REFERENCES
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
  • 1,173
  • PDF
On Vectorization of Deep Convolutional Neural Networks for Vision Tasks
  • 57
  • PDF
Fast Convolutional Nets With fbfft: A GPU Performance Evaluation
  • 256
  • Highly Influential
  • PDF
DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning
  • 1,022
  • PDF
Fast Training of Convolutional Networks through FFTs
  • 379
  • PDF
Fast Algorithms for Convolutional Neural Networks
  • Andrew Lavin, Scott Gray
  • Computer Science
  • 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
  • 487
  • Highly Influential
  • PDF
Deep learning with COTS HPC systems
  • 624
  • PDF
cuDNN: Efficient Primitives for Deep Learning
  • 1,179
  • PDF
Minimizing Computation in Convolutional Neural Networks
  • 227
  • PDF
...
1
2
3
4
5
...