• Corpus ID: 208527260

Neural Epitome Search for Architecture-Agnostic Network Compression

@article{Zhou2020NeuralES,
  title={Neural Epitome Search for Architecture-Agnostic Network Compression},
  author={Daquan Zhou and Xiaojie Jin and Qibin Hou and Kaixin Wang and Jianchao Yang and Jiashi Feng},
  journal={arXiv: Computer Vision and Pattern Recognition},
  year={2020}
}
The recent WSNet [1] is a new model compression method through sampling filterweights from a compact set and has demonstrated to be effective for 1D convolutionneural networks (CNNs). However, the weights sampling strategy of WSNet ishandcrafted and fixed which may severely limit the expression ability of the resultedCNNs and weaken its compression ability. In this work, we present a novel auto-sampling method that is applicable to both 1D and 2D CNNs with significantperformance improvement… 
DKM: Differentiable K-Means Clustering Layer for Neural Network Compression
TLDR
This work proposes a novel differentiable k-means clustering layer (DKM) and its application to train-time weight clustering based DNN model compression and demonstrates that DKM delivers superior compression and accuracy trade-off on ImageNet1k and GLUE benchmarks.
Compressing Neural Networks With Inter Prediction and Linear Transformation
TLDR
To effectively adapt the inter prediction scheme from video coding technology, the proposed method integrates a linear transformation into the prediction scheme, which significantly enhances compression efficiency.
Refiner: Refining Self-attention for Vision Transformers
TLDR
This work introduces a conceptually simple scheme, called refiner, to directly refine the selfattention maps of ViTs, and explores attention expansion that projects the multi-head attention maps to a higher-dimensional space to promote their diversity.
AutoSpace: Neural Architecture Search with Less Human Interference
TLDR
A novel differentiable evolutionary framework named AutoSpace is proposed, which evolves the search space to an optimal one with following novel techniques: a differentiable fitness scoring function to efficiently evaluate the performance of cells and a reference architecture to speedup the evolution procedure and avoid falling into sub-optimal solutions.
TAda! Temporally-Adaptive Convolutions for Video Understanding
TLDR
This work presents Temporally-Adaptive Convolutions (TAdaConv) for video understanding, which shows that adaptive weight calibration along the temporal dimension is an efficient way to facilitate modelling complex temporal dynamics in videos.
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
TLDR
A new TokensTo-Token Vision Transformers (T2T-ViT), which introduces an efficient backbone with a deep-narrow structure for vision transformers motivated by CNN architecture design after extensive study and reduces the parameter counts and MACs of vanilla ViT by 200%, while achieving more than 2.5% improvement when trained from scratch on ImageNet.
Rethinking Bottleneck Structure for Efficient Mobile Network Design
TLDR
This paper proposes to flip the structure and present a novel bottleneck design, called the sandglass block, that performs identity mapping and spatial transformation at higher dimensions and thus alleviates information loss and gradient confusion effectively and adds it into the search space of neural architecture search method DARTS.

References

SHOWING 1-10 OF 44 REFERENCES
WSNet: Compact and Efficient Networks with Weight Sampling
TLDR
It is demonstrated that such a novel weight sampling approach (and induced WSNet) promotes both weights and computation sharing favorably and can more efficiently learn much smaller networks with competitive performance compared to baseline networks with equal numbers of convolution filters.
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
TLDR
A new scaling method is proposed that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient and is demonstrated the effectiveness of this method on scaling up MobileNets and ResNet.
WSNet: Compact and Efficient Networks Through Weight Sampling
TLDR
It is demonstrated that such a novel weight sampling approach (and induced WSNet) promotes both weights and computation sharing favorably and can more efficiently learn much smaller networks with competitive performance compared to baseline networks with equal numbers of convolution filters.
Speeding up Convolutional Neural Networks with Low Rank Expansions
TLDR
Two simple schemes for drastically speeding up convolutional neural networks are presented, achieved by exploiting cross-channel or filter redundancy to construct a low rank basis of filters that are rank-1 in the spatial domain.
Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding
TLDR
This work introduces "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy.
FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search
TLDR
This work proposes a differentiable neural architecture search (DNAS) framework that uses gradient-based methods to optimize ConvNet architectures, avoiding enumerating and training individual architectures separately as in previous methods.
ChannelNets: Compact and Efficient Convolutional Neural Networks via Channel-Wise Convolutions
TLDR
This work proposes to compress deep models by using channel-wise convolutions, which replace dense connections among feature maps with sparse ones in CNNs, and builds light-weight CNNs known as ChannelNets.
Learning Transferable Architectures for Scalable Image Recognition
TLDR
This paper proposes to search for an architectural building block on a small dataset and then transfer the block to a larger dataset and introduces a new regularization technique called ScheduledDropPath that significantly improves generalization in the NASNet models.
AMC: AutoML for Model Compression and Acceleration on Mobile Devices
TLDR
This paper proposes AutoML for Model Compression (AMC) which leverages reinforcement learning to efficiently sample the design space and can improve the model compression quality and achieves state-of-the-art model compression results in a fully automated way without any human efforts.
Low-bit Quantization of Neural Networks for Efficient Inference
TLDR
This paper formalizes the linear quantization task as a Minimum Mean Squared Error (MMSE) problem for both weights and activations, allowing low-bit precision inference without the need for full network retraining.
...
1
2
3
4
5
...