Run-Time Efficient RNN Compression for Inference on Edge Devices

@article{Thakker2019RunTimeER,
  title={Run-Time Efficient RNN Compression for Inference on Edge Devices},
  author={Urmish Thakker and Jesse G. Beu and Dibakar Gope and Ganesh S. Dasika and Matthew Mattina},
  journal={2019 2nd Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2)},
  year={2019},
  pages={26-30}
}
  • Urmish Thakker, Jesse G. Beu, +2 authors Matthew Mattina
  • Published 2019
  • Computer Science, Mathematics
  • 2019 2nd Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2)
Recurrent neural networks can be large and compute-intensive, yet many applications that benefit from RNNs run on small devices with very limited compute and storage capabilities while still having run-time constraints. As a result, there is a need for compression techniques that can achieve significant compression without negatively impacting inference run-time and task accuracy. This paper explores a new compressed RNN cell implementation called Hybrid Matrix Decomposition (HMD) that achieves… Expand
15 Citations
Pushing the limits of RNN Compression
  • 12
  • PDF
Rank and run-time aware compression of NLP Applications
  • 4
  • PDF
Compressing RNNs for IoT devices by 15-38x using Kronecker Products
  • 23
  • PDF
MICRONETS: NEURAL NETWORK ARCHITECTURES
  • PDF
Symmetric $k$-Means for Deep Neural Network Compression and Hardware Acceleration on FPGAs
  • 1
Ternary MobileNets via Per-Layer Hybrid Filter Banks
  • 12
  • PDF
Compressing Language Models using Doped Kronecker Products
  • 5
  • PDF
...
1
2
...

References

SHOWING 1-10 OF 38 REFERENCES
Pushing the limits of RNN Compression
  • 12
  • PDF
Measuring scheduling efficiency of RNNs for NLP applications
  • 10
  • PDF
Alternating Multi-bit Quantization for Recurrent Neural Networks
  • 72
  • PDF
Efficient Recurrent Neural Networks using Structured Matrices in FPGAs
  • 10
  • Highly Influential
  • PDF
Structured Transforms for Small-Footprint Deep Learning
  • 169
  • Highly Influential
  • PDF
CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-Circulant Weight Matrices
  • Caiwen Ding, Siyu Liao, +13 authors Bo Yuan
  • Computer Science
  • 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
  • 2017
  • 143
  • PDF
C-LSTM: Enabling Efficient LSTM using Structured Compression Techniques on FPGAs
  • 97
  • PDF
Effective Quantization Methods for Recurrent Neural Networks
  • 52
  • PDF
Improving the speed of neural networks on CPUs
  • 601
  • PDF
Compressing RNNs for IoT devices by 15-38x using Kronecker Products
  • 23
  • PDF
...
1
2
3
4
...