• Corpus ID: 208527260

Neural Epitome Search for Architecture-Agnostic Network Compression

  title={Neural Epitome Search for Architecture-Agnostic Network Compression},
  author={Daquan Zhou and Xiaojie Jin and Qibin Hou and Kaixin Wang and Jianchao Yang and Jiashi Feng},
  journal={arXiv: Computer Vision and Pattern Recognition},
The recent WSNet [1] is a new model compression method through sampling filterweights from a compact set and has demonstrated to be effective for 1D convolutionneural networks (CNNs). However, the weights sampling strategy of WSNet ishandcrafted and fixed which may severely limit the expression ability of the resultedCNNs and weaken its compression ability. In this work, we present a novel auto-sampling method that is applicable to both 1D and 2D CNNs with significantperformance improvement… 

DKM: Differentiable K-Means Clustering Layer for Neural Network Compression

This work proposes a novel differentiable k-means clustering layer (DKM) and its application to train-time weight clustering based DNN model compression and demonstrates that DKM delivers superior compression and accuracy trade-off on ImageNet1k and GLUE benchmarks.

TAda! Temporally-Adaptive Convolutions for Video Understanding

This work presents Temporally-Adaptive Convolutions (TAdaConv) for video understanding, which shows that adaptive weight calibration along the temporal dimension is an efficient way to facilitate modelling complex temporal dynamics in videos.

Refiner: Refining Self-attention for Vision Transformers

This work introduces a conceptually simple scheme, called refiner, to directly refine the selfattention maps of ViTs, and explores attention expansion that projects the multi-head attention maps to a higher-dimensional space to promote their diversity.

Compressing Neural Networks With Inter Prediction and Linear Transformation

To effectively adapt the inter prediction scheme from video coding technology, the proposed method integrates a linear transformation into the prediction scheme, which significantly enhances compression efficiency.

AutoSpace: Neural Architecture Search with Less Human Interference

A novel differentiable evolutionary framework named AutoSpace is proposed, which evolves the search space to an optimal one with following novel techniques: a differentiable fitness scoring function to efficiently evaluate the performance of cells and a reference architecture to speedup the evolution procedure and avoid falling into sub-optimal solutions.

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

A new Tokens-To-Token Vision Transformer (T2T-VTT), which incorporates an efficient backbone with a deep-narrow structure for vision transformer motivated by CNN architecture design after empirical study and reduces the parameter count and MACs of vanilla ViT by half.

Rethinking Bottleneck Structure for Efficient Mobile Network Design

This paper proposes to flip the structure and present a novel bottleneck design, called the sandglass block, that performs identity mapping and spatial transformation at higher dimensions and thus alleviates information loss and gradient confusion effectively and adds it into the search space of neural architecture search method DARTS.



AMC: AutoML for Model Compression and Acceleration on Mobile Devices

This paper proposes AutoML for Model Compression (AMC) which leverages reinforcement learning to efficiently sample the design space and can improve the model compression quality and achieves state-of-the-art model compression results in a fully automated way without any human efforts.

MobileNetV2: Inverted Residuals and Linear Bottlenecks

A new mobile architecture, MobileNetV2, is described that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes and allows decoupling of the input/output domains from the expressiveness of the transformation.

Deep Residual Learning for Image Recognition

This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

AutoSlim: Towards One-Shot Architecture Search for Channel Numbers

A simple and one-shot solution to set channel numbers in a neural network to achieve better accuracy under constrained resources (e.g., FLOPs, latency, memory footprint or model size) is presented.

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

A new scaling method is proposed that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient and is demonstrated the effectiveness of this method on scaling up MobileNets and ResNet.

Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks With Octave Convolution

This work proposes to factorize the mixed feature maps by their frequencies, and design a novel Octave Convolution (OctConv) operation to store and process feature maps that vary spatially “slower” at a lower spatial resolution reducing both memory and computation cost.

ASAP: Architecture Search, Anneal and Prune

A differentiable search space is proposed that allows the annealing of architecture weights, while gradually pruning inferior operations, in this way, the search converges to a single output network in a continuous manner.

Network Slimming by Slimmable Networks: Towards One-Shot Architecture Search for Channel Numbers

A simple and one-shot solution to set channel numbers in a neural network to achieve better accuracy under constrained resources (e.g., FLOPs, latency, memory footprint or model size) is presented.

Universally Slimmable Networks and Improved Training Techniques

This work proposes a systematic approach to train universally slimmable networks (US-Nets), extending slimmables to execute at arbitrary width, and generalizing to networks both with and without batch normalization layers, and opens up the possibility to directly evaluate FLOPs-Accuracy spectrum of network architectures.

Partial Order Pruning: For Best Speed/Accuracy Trade-Off in Neural Architecture Search

This work proposes an algorithm that can offer better speed/accuracy trade-off of searched networks, and presents several Dongfeng (DF) networks that provide high accuracy and fast inference speed on various application GPU platforms.