Corpus ID: 225041191

Mixed-Precision Embedding Using a Cache

  title={Mixed-Precision Embedding Using a Cache},
  author={Jie Yang and Jianyu Huang and Jongsoo Park and P. Tang and A. Tulloch},
  • Jie Yang, Jianyu Huang, +2 authors A. Tulloch
  • Published 2020
  • Computer Science
  • ArXiv
  • In recommendation systems, practitioners observed that increase in the number of embedding tables and their sizes often leads to significant improvement in model performances. Given this and the business importance of these models to major internet companies, embedding tables for personalization tasks have grown to terabyte scale and continue to grow at a significant rate. Meanwhile, these large-scale models are often trained with GPUs where high-performance memory is a scarce resource, thus… CONTINUE READING


    Post-Training 4-bit Quantization on Embedding Tables
    • 1
    • PDF
    Compression-aware Training of Deep Networks
    • 89
    • PDF
    Block based Singular Value Decomposition approach to matrix factorization for recommender systems
    • 5
    • PDF
    Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems
    • 15
    • PDF
    Training with Quantization Noise for Extreme Model Compression
    • 19
    • PDF
    Training and Inference with Integers in Deep Neural Networks
    • 187
    • PDF