Corpus ID: 25717514

Block-Sparse Recurrent Neural Networks

@article{Narang2017BlockSparseRN,
  title={Block-Sparse Recurrent Neural Networks},
  author={Sharan Narang and Eric Undersander and Gregory Frederick Diamos},
  journal={ArXiv},
  year={2017},
  volume={abs/1711.02782}
}
Recurrent Neural Networks (RNNs) are used in state-of-the-art models in domains such as speech recognition, machine translation, and language modelling. [...] Key Method Using these techniques, we can create block-sparse RNNs with sparsity ranging from 80% to 90% with a small loss in accuracy. This technique allows us to reduce the model size by roughly 10x. Additionally, we can prune a larger dense network to recover this loss in accuracy while maintaining high block sparsity and reducing the overall parameter…Expand
Block-wise Dynamic Sparseness
TLDR
A new method for dynamic sparseness, whereby part of the computations are omitted dynamically, based on the input, whereby the method achieves similar language modeling perplexities as the dense baseline, at half the computational cost at inference time. Expand
Dynamic Block Sparse Reparameterization of Convolutional Neural Networks
TLDR
This work focuses on block sparsity and generates efficient block sparse convolutional neural networks using the approach DBSR (Dynamic block sparse reparameterization), which decreases parameters and FLOPS of ResneXt50 by a factor of 2x with only increase of 0.48 in Top-1 error. Expand
Rethinking Full Connectivity in Recurrent Neural Networks
TLDR
Structurally sparse RNNs are studied, showing that they are well suited for acceleration on parallel hardware, with a greatly reduced cost of the recurrent operations as well as orders of magnitude less recurrent weights. Expand
Hierarchical Block Sparse Neural Networks
TLDR
This work jointly addresses both accuracy and performance of sparse DNNs using their proposed class of sparse neural networks called HBsNN (Hierarchical Block sparse Neural Networks). Expand
Structured Pruning of Recurrent Neural Networks through Neuron Selection
TLDR
This work proposes a structured pruning method through neuron selection which can remove the independent neuron of RNNs and introduces two sets of binary random variables, which can be interpreted as gates or switches to the input neurons and the hidden neurons, respectively. Expand
Compressing LSTM Networks with Hierarchical Coarse-Grain Sparsity
TLDR
A new LSTM training technique based on hierarchical coarse-grain sparsity (HCGS), which enforces hierarchical structured sparsity by randomly dropping static block-wise connections between layers between layers to reduce weight storage for both training and inference hardware systems. Expand
Structured in Space, Randomized in Time: Leveraging Dropout in RNNs for Efficient Training
TLDR
This work proposes to structure dropout patterns, by dropping out the same set of physical neurons within a batch, resulting in column (row) level hidden state sparsity, which are well amenable to computation reduction at run-time in general-purpose SIMD hardware as well as systolic arrays. Expand
One-Shot Pruning of Recurrent Neural Networks by Jacobian Spectrum Evaluation
TLDR
A new recurrent pruning objective derived from the spectrum of the recurrent Jacobian is introduced, which is data efficient, easy to implement, and produces 95% sparse GRUs that significantly improve on existing baselines. Expand
Accelerating Sparse Deep Neural Networks
TLDR
The design and behavior of Sparse Tensor Cores are presented, which exploit a 2:4 (50%) sparsity pattern that leads to twice the math throughput of dense matrix units. Expand
CSB-RNN: a faster-than-realtime RNN acceleration framework with compressed structured blocks
TLDR
This paper presents CSB-RNN, an optimized full-stack RNN framework with a novel compressed structured block (CSB) pruning technique, and proposes a novel hardware architecture with a dedicated compiler to address the challenging workload imbalance issue and significantly improves the hardware efficiency. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 32 REFERENCES
Exploring Sparsity in Recurrent Neural Networks
TLDR
This work proposes a technique to reduce the parameters of a network by pruning weights during the initial training of the network, which reduces the size of the model and can also help achieve significant inference time speed-up using sparse matrix multiply. Expand
Exploiting sparseness in deep neural networks for large vocabulary speech recognition
  • Dong Yu, F. Seide, G. Li, L. Deng
  • Computer Science
  • 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2012
TLDR
The goal of enforcing sparseness as soft regularization and convex constraint optimization problems is formulated, solutions under the stochastic gradient ascent setting are proposed, and novel data structures are proposed to exploit the randomSparseness patterns to reduce model size and computation time. Expand
Learning Structured Sparsity in Deep Neural Networks
TLDR
The results show that for CIFAR-10, regularization on layer depth can reduce 20 layers of a Deep Residual Network to 18 layers while improve the accuracy from 91.25% to 92.60%, which is still slightly higher than that of original ResNet with 32 layers. Expand
Improving the speed of neural networks on CPUs
TLDR
This paper uses speech recognition as an example task, and shows that a real-time hybrid hidden Markov model / neural network (HMM/NN) large vocabulary system can be built with a 10× speedup over an unoptimized baseline and a 4× speed up over an aggressively optimized floating-point baseline at no cost in accuracy. Expand
Exploring the Regularity of Sparse Structure in Convolutional Neural Networks
TLDR
This paper quantitatively measure the trade-off between sparsity regularity and prediction accuracy, providing insights in how to maintain accuracy while having more a more structured sparsity pattern. Expand
Sparse Convolutional Neural Networks
TLDR
This work shows how to reduce the redundancy in these parameters using a sparse decomposition, and proposes an efficient sparse matrix multiplication algorithm on CPU for Sparse Convolutional Neural Networks (SCNN) models. Expand
Learning Intrinsic Sparse Structures within Long Short-term Memory
TLDR
This work aims to learn structurally-sparse Long Short-Term Memory by reducing the sizes of basic structures within LSTM units, including input updates, gates, hidden states, cell states and outputs, by proposing Intrinsic Sparse Structures (ISS) in LSTMs. Expand
Mixed Precision Training
TLDR
This work introduces a technique to train deep neural networks using half precision floating point numbers, and demonstrates that this approach works for a wide variety of models including convolution neural networks, recurrent neural networks and generative adversarial networks. Expand
Compressing Low Precision Deep Neural Networks Using Sparsity-Induced Regularization in Ternary Networks
A low precision deep neural network training technique for producing sparse, ternary neural networks is presented. The technique incorporates hard- ware implementation costs during training toExpand
Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding
TLDR
This work introduces "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy. Expand
...
1
2
3
4
...