Mixed-Precision Embedding Using a Cache
@article{Yang2020MixedPrecisionEU, title={Mixed-Precision Embedding Using a Cache}, author={Jie Yang and Jianyu Huang and Jongsoo Park and P. Tang and A. Tulloch}, journal={ArXiv}, year={2020}, volume={abs/2010.11305} }
In recommendation systems, practitioners observed that increase in the number of embedding tables and their sizes often leads to significant improvement in model performances. Given this and the business importance of these models to major internet companies, embedding tables for personalization tasks have grown to terabyte scale and continue to grow at a significant rate. Meanwhile, these large-scale models are often trained with GPUs where high-performance memory is a scarce resource, thus… CONTINUE READING
Figures and Tables from this paper
References
SHOWING 1-10 OF 38 REFERENCES
Deep Learning Recommendation Model for Personalization and Recommendation Systems
- Computer Science
- ArXiv
- 2019
- 75
- PDF
Block based Singular Value Decomposition approach to matrix factorization for recommender systems
- Mathematics, Computer Science
- ArXiv
- 2019
- 5
- PDF
Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems
- Computer Science, Mathematics
- MLSys
- 2020
- 15
- PDF
Training with Quantization Noise for Extreme Model Compression
- Computer Science, Mathematics
- ArXiv
- 2020
- 19
- PDF
Training and Inference with Integers in Deep Neural Networks
- Computer Science, Mathematics
- ICLR
- 2018
- 187
- PDF
Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications
- Computer Science, Mathematics
- ArXiv
- 2018
- 57
- PDF