Corpus ID: 221082162

On Efficient Constructions of Checkpoints

  title={On Efficient Constructions of Checkpoints},
  author={Y. Chen and Z. Liu and Bin Ren and Xin Jin},
  • Y. Chen, Z. Liu, +1 author Xin Jin
  • Published in ICML 2020
  • Computer Science, Mathematics
  • Efficient construction of checkpoints/snapshots is a critical tool for training and diagnosing deep learning models. In this paper, we propose a lossy compression scheme for checkpoint constructions (called LC-Checkpoint). LC-Checkpoint simultaneously maximizes the compression rate and optimizes the recovery speed, under the assumption that SGD is used to train the model. LC-Checkpointuses quantization and priority promotion to store the most crucial information for SGD to recover, and then… CONTINUE READING
    1 Citations

    Figures from this paper


    Communication-efficient distributed SGD with Sketching
    • 43
    • PDF
    Potential benefits of delta encoding and data compression for HTTP
    • 407
    • PDF
    Fault Tolerance in Iterative-Convergent Machine Learning
    • 12
    • Highly Influential
    • PDF
    Litz: Elastic Framework for High-Performance Distributed Machine Learning
    • 27
    • PDF
    Distributed GraphLab: A Framework for Machine Learning in the Cloud
    • 571
    • PDF
    Dynamic Network Surgery for Efficient DNNs
    • 512
    • PDF
    QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding
    • 445
    • PDF
    Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding
    • 4,046
    • PDF
    NeST: A Neural Network Synthesis Tool Based on a Grow-and-Prune Paradigm
    • 91
    • PDF