Corpus ID: 13747349

Profile-guided memory optimization for deep neural networks

  title={Profile-guided memory optimization for deep neural networks},
  author={Taro Sekiyama and Takashi Imamichi and Haruki Imai and Raymond H. Putra},
Recent years have seen deep neural networks (DNNs) becoming wider and deeper to achieve better performance in many applications of AI. Such DNNs however require huge amounts of memory to store weights and intermediate results (e.g., activations, feature maps, etc.) in propagation. This requirement makes it difficult to run the DNNs on devices with limited, hard-to-extend memory, degrades the running time performance, and restricts the design of network models. We address this challenge by… Expand
Memory Optimization Techniques in Neural Networks: A Review
Deep neural networks have been continuously evolving towards larger and more complex models to solve challenging problems in the field of AI. The primary bottleneck that restricts new networkExpand
Memory-efficient deep learning inference with incremental weight loading and data layout reorganization on edge systems
This study explored the incremental loading strategy of model weights for the model inference and proposed schemes are orthogonal to existing models, which reduced the memory consumption by 61.05% without additional inference time overhead. Expand
Efficient Memory Management for Deep Neural Net Inference
This work explores various strategies to smartly share memory buffers among intermediate tensors in deep neural nets, finding that employing these can result in up to 11% smaller memory footprint than the state of the art. Expand
A Computational-Graph Partitioning Method for Training Memory-Constrained DNNs
ParDNN decides a placement of DNN's underlying computational graph operations across multiple devices so that the devices' memory constraints are met and the training time is minimized, which outperforms or qualitatively improves upon related work. Expand
Supporting Very Large Models using Automatic Dataflow Graph Partitioning
Tofu is a system that partitions very large DNN models across multiple GPU devices to reduce per-GPU memory footprint and describes the semantics of an operator in a simple language inspired by Halide to automatically partition each operator. Expand
Understanding and optimizing packed neural network training for hyper-parameter tuning
This paper proposes a primitive for jointly training multiple neural network models on a single GPU, called pack, and presents a comprehensive empirical study of pack and end-to-end experiments that suggest significant improvements for hyperparameter tuning. Expand


Training Deep Nets with Sublinear Memory Cost
This work designs an algorithm that costs O( √ n) memory to train a n layer network, with only the computational cost of an extra forward pass per mini-batch, and shows that it is possible to trade computation for memory giving a more memory efficient training algorithm with a little extra computation cost. Expand
vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design
The most widely used machine learning frameworks require users to carefully tune their memory usage so that the deep neural network (DNN) fits into the DRAM capacity of a GPU. This restrictionExpand
Superneurons: dynamic GPU memory management for training deep neural networks
This work presents SuperNeurons: a dynamic GPU memory scheduling runtime to enable the network training far beyond the GPU DRAM capacity, which can train ResNet2500 that has 104 basic network layers on a 12GB K40c and dynamically allocates the memory for convolution workspaces to achieve the high performance. Expand
Memory reduction method for deep neural network training
  • K. Shirahata, Y. Tomita, A. Ike
  • Computer Science
  • 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP)
  • 2016
A method to reduce the amount of memory for training a deep neural network is presented, which enables to suppress memory increase during the backward pass, by reusing the memory regions allocated for the forward pass. Expand
Training Deeper Models by GPU Memory Optimization on TensorFlow
With the advent of big data, easy-to-get GPGPU and progresses in neural network modeling techniques, training deep learning model on GPU becomes a popular choice. However, due to the inherentExpand
Learning both Weights and Connections for Efficient Neural Network
A method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections, and prunes redundant connections using a three-step method. Expand
Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding
This work introduces "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy. Expand
Compressing Deep Convolutional Networks using Vector Quantization
This paper is able to achieve 16-24 times compression of the network with only 1% loss of classification accuracy using the state-of-the-art CNN, and finds in terms of compressing the most storage demanding dense connected layers, vector quantization methods have a clear gain over existing matrix factorization methods. Expand
Sequence to Sequence Learning with Neural Networks
This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier. Expand
Deep Residual Learning for Image Recognition
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. Expand