A novel memory-efficient deep learning training framework via error-bounded lossy compression

@article{Jin2021ANM,
  title={A novel memory-efficient deep learning training framework via error-bounded lossy compression},
  author={Sian Jin and Guanpeng Li and Shuaiwen Song and Dingwen Tao},
  journal={Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming},
  year={2021}
}
  • Sian Jin, Guanpeng Li, Dingwen Tao
  • Published 18 November 2020
  • Computer Science
  • Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
DNNs are becoming increasingly deeper, wider, and nonlinear due to the growing demands on prediction accuracy and analysis quality. When training a DNN model, the intermediate activation data must be saved in the memory during forward propagation and then restored for backward propagation. Traditional memory saving techniques such as data recomputation and migration either suffers from a high performance overhead or is constrained by specific interconnect technology and limited bandwidth. In… 

COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression

TLDR
This paper proposes a novel memory-efficient CNN training framework that leverages error-bounded lossy compression to significantly reduce the memory requirement for training in order to allow training larger models or to accelerate training.

COMET

TLDR
A novel memory-efficient CNN training framework that leverages error-bounded lossy compression to significantly reduce the memory requirement for training in order to allow training larger models or to accelerate training.

AC-GC: Lossy Activation Compression with Guaranteed Convergence

TLDR
This paper builds upon recent developments on Stochastic Gradient Descent convergence to prove an upper bound on the expected loss increase when training with compressed activation storage, and expresses activation compression error in terms of this bound, allowing the compression rate to adapt to training conditions automatically.

FreeLunch: Compression-based GPU Memory Management for Convolutional Neural Networks

TLDR
This paper proposes a compression-based technique called FreeLunch that actively compresses the intermediate data to reduce the memory footprint of training large CNN models and has up to 35% less memory consumption and up to 70% better throughput than swapping and recomputation.

η-LSTM: Co-Designing Highly-Efficient Large LSTM Training via Exploiting Memory-Saving and Architectural Design Opportunities

Recently, the recurrent neural network, or its most popular type—the Long Short Term Memory (LSTM) network— has achieved great success in a broad spectrum of real-world application domains, such as

GACT: Activation Compressed Training for General Architectures

TLDR
GACT, an ACT framework to support a broad range of machine learning tasks for generic NN architectures with limited domain knowledge is presented and the convergence of GACT is proved by analyzing a linearized version of ACT’s approximate gradient.

GACT: Activation Compressed Training for Generic Network Architectures

TLDR
This paper presents GACT, an ACT framework to support a broad range of machine learning tasks for generic NN architectures with limited domain knowledge, and proves the convergence of GACT without prior knowledge on operator type or model architecture.

Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees

TLDR
This paper proposes AC-SGD, a novel activation compression algorithm for communication-efficient pipeline parallelism training over slow networks, and shows that it can be combined with state-of-the-art gradient compression algorithms to enable “end-to-end communication compression”.

Accelerating Parallel Write via Deeply Integrating Predictive Lossy Compression with HDF5

TLDR
This work proposes to deeply integrate predictive lossy compression with HDF5 to improve the parallel-write performance and proposes analyt-ical models to predict the time of compression and parallel write before the actual compression to enable compression-write overlapping.

References

SHOWING 1-10 OF 51 REFERENCES

DeepSZ: A Novel Framework to Compress Deep Neural Networks by Using Error-Bounded Lossy Compression

TLDR
This paper proposes DeepSZ: an accuracy-loss expected neural network compression framework, which involves four key steps: network pruning, error bound assessment, optimization for error bound configuration, and compressed model generation, featuring a high compression ratio and low encoding time.

Training Deep Nets with Sublinear Memory Cost

TLDR
This work designs an algorithm that costs O( √ n) memory to train a n layer network, with only the computational cost of an extra forward pass per mini-batch, and shows that it is possible to trade computation for memory giving a more memory efficient training algorithm with a little extra computation cost.

JPEG-ACT: Accelerating Deep Learning via Transform-based Lossy Compression

TLDR
This work proposes JPEG for ACTivations (JPEGACT), a lossy activation offload accelerator for training CNNs that works by discarding redundant spatial information, and shows how to optimize the JPEG algorithm so as to ensure convergence and maintain accuracy during training.

Superneurons: dynamic GPU memory management for training deep neural networks

TLDR
This work presents SuperNeurons: a dynamic GPU memory scheduling runtime to enable the network training far beyond the GPU DRAM capacity, which can train ResNet2500 that has 104 basic network layers on a 12GB K40c and dynamically allocates the memory for convolution workspaces to achieve the high performance.

Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks

TLDR
A high-performance virtualization strategy based on a "compressing DMA engine" (cDMA) that drastically reduces the size of the data structures that are targeted for CPU-side allocations is introduced.

vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design

The most widely used machine learning frameworks require users to carefully tune their memory usage so that the deep neural network (DNN) fits into the DRAM capacity of a GPU. This restriction

Layrub: layer-centric GPU memory reuse and data migration in extreme-scale deep learning systems

TLDR
This work identifies the memory usage characteristics for deep and wide convolutional networks, and demonstrates the opportunities of memory reuse on both intra-layer and inter-layer levels, and presents Layrub, a runtime data placement strategy that orchestrates the execution of training process.

The Reversible Residual Network: Backpropagation Without Storing Activations

TLDR
The Reversible Residual Network (RevNet) is presented, a variant of ResNets where each layer's activations can be reconstructed exactly from the next layer's, therefore, the activations for most layers need not be stored in memory during backpropagation.

Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs

TLDR
Buddy Compression is an architecture that makes novel use of compression to utilize a larger buddy-memory from the host or disaggregated memory, effectively increasing the memory capacity of the GPU.

Error-Controlled Lossy Compression Optimized for High Compression Ratios of Scientific Datasets

TLDR
Evaluation results confirm that the new adaptive solution can significantly improve the rate distortion for the lossy compression with fairly high compression ratios.
...