LCNN: Lookup-Based Convolutional Neural Network

@article{Bagherinezhad2017LCNNLC,
  title={LCNN: Lookup-Based Convolutional Neural Network},
  author={Hessam Bagherinezhad and Mohammad Rastegari and Ali Farhadi},
  journal={2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2017},
  pages={860-869}
}
Porting state of the art deep learning algorithms to resource constrained compute platforms (e.g. VR, AR, wearables) is extremely challenging. We propose a fast, compact, and accurate model for convolutional neural networks that enables efficient learning and inference. We introduce LCNN, a lookup-based convolutional neural network that encodes convolutions by few lookups to a dictionary that is trained to cover the space of weights in CNNs. Training LCNN involves jointly learning a dictionary… 

Figures and Tables from this paper

ALF: Autoencoder-based Low-rank Filter-sharing for Efficient Convolutional Neural Networks
TLDR
The autoencoder-based low-rank filter-sharing technique (ALF) is compared to state-of-the-art pruning methods, demonstrating its efficient compression capabilities on theoretical metrics as well as on an accurate, deterministic hardware-model.
Towards Evolutionary Compression
TLDR
This paper presents an evolutionary method to automatically eliminate redundant convolution filters, representing each compressed network as a binary individual of specific fitness, which has the original network structure and can be directly deployed in any off-the-shelf deep learning libraries.
Depth-wise Decomposition for Accelerating Separable Convolutions in Efficient Convolutional Neural Networks
TLDR
This paper proposes a novel decomposition approach based on SVD, namely depth-wise decomposition, for expanding regular convolutions into depthwise separable convolutions while maintaining high accuracy, and improves the Top-1 accuracy of ShuffleNet V2 by ~2%.
PCNN: Environment Adaptive Model Without Finetuning
TLDR
Probability layer is proposed, an easily-implemented and highly flexible add-on module to adapt the model efficiently during runtime without any fine-tuning and achieving an equivalent or better performance than transfer learning.
Distribution-Aware Binarization of Neural Networks for Sketch Recognition
TLDR
This work presents a highly generalized, distribution-aware approach to binarizing deep networks that allows us to retain the advantages of a binarized network, while reducing accuracy drops.
Impostor Networks for Fast Fine-Grained Recognition
TLDR
In a series of experiments with three fine-grained datasets, it is shown that impostor networks are able to boost the classification accuracy of a moderate-sized convolutional network considerably at a very small computational cost.
ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks
TLDR
ShiftCNN is a generalized low-precision architecture for inference of multiplierless convolutional neural networks (CNNs) based on a power-of-two weight representation and performs only shift and addition operations, which substantially reduces computational cost of Convolutional layers by precomputing convolution terms.
Packing Convolutional Neural Networks in the Frequency Domain
TLDR
A series of approaches for compressing and speeding up CNNs in the frequency domain, which focuses not only on smaller weights but on all the weights and their underlying connections, and explores a data-driven method for removing redundancies in both spatial and frequency domains.
Speeding-up convolutional neural networks: A survey
TLDR
This short survey covers several research directions for speeding up CNNs that have become popular recently, including approaches based on tensor decompositions, weight quantization, weight pruning, and teacher-student approaches.
Design and Optimization of Hardware Accelerators for Deep Learning
TLDR
This dissertation proposes two hardware units, ISAAC and Newton, and shows that in-situ computing designs can outperform DNN digital accelerators, if they leverage pipelining, smart encodings, and can distribute a computation in time and space, within crossbars, and across crossbars.
...
...

References

SHOWING 1-10 OF 50 REFERENCES
Speeding up Convolutional Neural Networks with Low Rank Expansions
TLDR
Two simple schemes for drastically speeding up convolutional neural networks are presented, achieved by exploiting cross-channel or filter redundancy to construct a low rank basis of filters that are rank-1 in the spatial domain.
Quantized Convolutional Neural Networks for Mobile Devices
TLDR
This paper proposes an efficient framework, namely Quantized CNN, to simultaneously speed-up the computation and reduce the storage and memory overhead of CNN models.
Compressing Deep Convolutional Networks using Vector Quantization
TLDR
This paper is able to achieve 16-24 times compression of the network with only 1% loss of classification accuracy using the state-of-the-art CNN, and finds in terms of compressing the most storage demanding dense connected layers, vector quantization methods have a clear gain over existing matrix factorization methods.
Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation
TLDR
Using large state-of-the-art models, this work demonstrates speedups of convolutional layers on both CPU and GPU by a factor of 2 x, while keeping the accuracy within 1% of the original model.
Fixed point optimization of deep convolutional neural networks for object recognition
TLDR
The results indicate that quantization induces sparsity in the network which reduces the effective number of network parameters and improves generalization, and reduces the required memory storage by a factor of 1/10 and achieves better classification results than the high precision networks.
Learning Structured Sparsity in Deep Neural Networks
TLDR
The results show that for CIFAR-10, regularization on layer depth can reduce 20 layers of a Deep Residual Network to 18 layers while improve the accuracy from 91.25% to 92.60%, which is still slightly higher than that of original ResNet with 32 layers.
Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding
TLDR
This work introduces "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy.
Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1
TLDR
A binary matrix multiplication GPU kernel is written with which it is possible to run the MNIST BNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy.
ImageNet classification with deep convolutional neural networks
TLDR
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
Learning both Weights and Connections for Efficient Neural Network
TLDR
A method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections, and prunes redundant connections using a three-step method.
...
...