Corpus ID: 202628209

Full Deep Neural Network Training On A Pruned Weight Budget

@article{Golub2019FullDN,
  title={Full Deep Neural Network Training On A Pruned Weight Budget},
  author={Maximilian Golub and Guy Lemieux and Mieszko Lis},
  journal={arXiv: Learning},
  year={2019}
}
We introduce a DNN training technique that learns only a fraction of the full parameter set without incurring an accuracy penalty. To do this, our algorithm constrains the total number of weights updated during backpropagation to those with the highest total gradients. The remaining weights are not tracked, and their initial value is regenerated at every access to avoid storing them in memory. This can dramatically reduce the number of off-chip memory accesses during both training and inference… Expand
Low-Memory Neural Network Training: A Technical Report
TLDR
This paper profiles the overall memory usage of training on two representative deep learning benchmarks and comprehensively evaluates four standard techniques for reducing the training memory requirements: imposing sparsity on the model, using low precision, microbatching, and gradient checkpointing. Expand
Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network Training
TLDR
It is demonstrated that accelerating sparse training requires a co-design approach where algorithms are adapted to suit the constraints of hardware, and that hardware for sparse DNN training must tackle constraints that do not arise in inference accelerators. Expand
MARViN - Multiple Arithmetic Resolutions Vacillating in Neural Networks
TLDR
MARViN is introduced, a new quantized training strategy using information theorybased intra-epoch precision switching to find on a per-layer basis the lowest precision that causes no quantization-induced information loss while keeping precision high enough for future learning steps to not suffer from vanishing gradients, producing a fully quantized DNN. Expand
Adaptive Precision Training (ADEPT): A dynamic fixed point quantized sparsifying training approach for DNNs
Quantization is a technique for reducing deep neural networks (DNNs) training and inference times, which is crucial for training in resource constrained environments or time critical inferenceExpand
Campfire: Compressible, Regularization-Free, Structured Sparse Training for Hardware Accelerators
TLDR
The results show that with 70% target sparsity, over 75% top-1 accuracy is achievable and the proposed training methodology Campfire explores pruning at granularities within a convolutional kernel and filter. Expand
TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training
TLDR
TensorDash is a hardware-based technique that enables data-parallel MAC units to take advantage of sparsity in their input operand streams to speedup the training process while also increasing energy efficiency when used to compose a hardware accelerator for deep learning. Expand
Adaptive Precision Training (AdaPT): A dynamic fixed point quantized training approach for DNNs
TLDR
AdaPT, a new fixed-point quantized sparsifying training strategy that aims to determine on a per-layer basis the lowest precision that causes no quantization-induced information loss while keeping the precision high enough such that future learning steps do not suffer from vanishing gradients is introduced. Expand
BlockSwap: Fisher-guided Block Substitution for Network Compression
TLDR
This work develops BlockSwap: a fast algorithm for choosing networks with interleaved block types by passing a single minibatch of training data through randomly initialised networks and gauging their Fisher potential, yielding highly competitive networks in orders of magnitude less time than most architecture search techniques. Expand
BLOCKSWAP: FISHER-GUIDED BLOCK SUBSTITUTION
The desire to map neural networks to varying-capacity devices has led to the development of a wealth of compression techniques, many of which involve replacing standard convolutional blocks in aExpand
Dynamic Neural Network Architectural and Topological Adaptation and Related Methods - A Survey
TLDR
This survey aims to provide a general overview and categorization of state-of-the-art (SOTA) of techniques to reduced DNN training and inference time and space complexities with a particular focus on architectural adaptions. Expand
...
1
2
...

References

SHOWING 1-10 OF 60 REFERENCES
Training Deep Nets with Sublinear Memory Cost
TLDR
This work designs an algorithm that costs O( √ n) memory to train a n layer network, with only the computational cost of an extra forward pass per mini-batch, and shows that it is possible to trade computation for memory giving a more memory efficient training algorithm with a little extra computation cost. Expand
Learning both Weights and Connections for Efficient Neural Network
TLDR
A method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections, and prunes redundant connections using a three-step method. Expand
Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1
TLDR
A binary matrix multiplication GPU kernel is written with which it is possible to run the MNIST BNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy. Expand
Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding
TLDR
This work introduces "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy. Expand
Compression-aware Training of Deep Networks
TLDR
It is shown that accounting for compression during training allows us to learn much more compact, yet at least as effective, models than state-of-the-art compression techniques. Expand
Learning Efficient Convolutional Networks through Network Slimming
TLDR
The approach is called network slimming, which takes wide and large networks as input models, but during training insignificant channels are automatically identified and pruned afterwards, yielding thin and compact models with comparable accuracy. Expand
Revisiting Small Batch Training for Deep Neural Networks
TLDR
The collected experimental results show that increasing the mini-batch size progressively reduces the range of learning rates that provide stable convergence and acceptable test performance, which contrasts with recent work advocating the use ofmini-batch sizes in the thousands. Expand
Gist: Efficient Data Encoding for Deep Neural Network Training
TLDR
This paper investigates widely used DNNs and finds that the major contributors to memory footprint are intermediate layer outputs (feature maps), and introduces a framework for DNN-layer-specific optimizations that significantly reduce this source of main memory pressure on GPUs. Expand
WRPN: Wide Reduced-Precision Networks
TLDR
This work reduces the precision of activation maps (along with model parameters) and increase the number of filter maps in a layer, and finds that this scheme matches or surpasses the accuracy of the baseline full-precision network. Expand
ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression
TLDR
ThiNet is proposed, an efficient and unified framework to simultaneously accelerate and compress CNN models in both training and inference stages, and it is revealed that it needs to prune filters based on statistics information computed from its next layer, not the current layer, which differentiates ThiNet from existing methods. Expand
...
1
2
3
4
5
...