You Look Twice: GaterNet for Dynamic Filter Selection in CNNs

@article{Chen2019YouLT,
  title={You Look Twice: GaterNet for Dynamic Filter Selection in CNNs},
  author={Zhourong Chen and Yang Li and Samy Bengio and Si Si},
  journal={2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2019},
  pages={9164-9172}
}
  • Zhourong Chen, Yang Li, +1 author Si Si
  • Published 2019
  • Computer Science
  • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
The concept of conditional computation for deep nets has been proposed previously to improve model performance by selectively using only parts of the model conditioned on the sample it is processing. In this paper, we investigate input-dependent dynamic filter selection in deep convolutional neural networks (CNNs). The problem is interesting because the idea of forcing different parts of the model to learn from different types of samples may help us acquire better filters in CNNs, improve the… Expand
Batch-shaping for learning conditional channel gated networks
TLDR
This work introduces a new residual block architecture that gates convolutional channels in a fine-grained manner and introduces a generally applicable tool that matches the marginal aggregate posteriors of features in a neural network to a pre-specified prior distribution. Expand
Class-specific early exit design methodology for convolutional neural networks
TLDR
This paper proposes a methodology for designing early-exit networks from a given baseline model aiming to improve the average latency for a targeted subset class constrained by the original accuracy for all classes. Expand
Dynamic Channel and Layer Gating in Convolutional Neural Networks
TLDR
It is argued, that combining the recently proposed channel gating mechanism with layer gating can significantly reduce the computational cost of large CNNs. Expand
Conditional Channel Gated Networks for Task-Aware Continual Learning
TLDR
This work equips each convolutional layer with task-specific gating modules, selecting which filters to apply on the given input, and introduces a task classifier that predicts the task label of each example, to deal with settings in which a task oracle is not available. Expand
Fractional Skipping: Towards Finer-Grained Dynamic CNN Inference
TLDR
More visualizations indicate a smooth and consistent transition in the DFS behaviors, especially the learned choices between layer skipping and different quantizations when the total computational budgets vary, validating the hypothesis that layer quantization could be viewed as intermediate variants of layer skipping. Expand
DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and Transformers
  • Changlin Li, Guangrun Wang, Bing Wang, Xiaodan Liang, Zhihui Li, Xiaojun Chang
  • Computer Science
  • ArXiv
  • 2021
TLDR
A hardware-efficient dynamic inference regime, named dynamic weight slicing, that adaptively slices a part of network parameters for inference while keeping it stored statically and contiguously in hardware to prevent the extra burden of sparse computation. Expand
BasisNet: Two-stage Model Synthesis for Efficient Inference
TLDR
BasisNet is presented which combines recent advancements in efficient neural network architectures, conditional computation, and early termination in a simple new form and shows that proper training recipes are critical for increasing generalizability for such high capacity neural networks. Expand
Pruning Blocks for CNN Compression and Acceleration via Online Ensemble Distillation
TLDR
This paper proposes an online ensemble distillation method to automatically prune blocks/layers of a target network by transferring the knowledge from a strong teacher in an end-to-end manner, and employs the fast iterative shrinkage-thresholding algorithm to fast and reliably remove the redundant blocks. Expand
Dynamic Neural Network Decoupling
TLDR
A novel architecture decoupling method is proposed, which dynamically discovers the hierarchical path consisting of activated filters for each input image, and subsequently disentangles the filters by limiting the outputs of filter during training. Expand
CondConv: Conditionally Parameterized Convolutions for Efficient Inference
TLDR
This work proposes conditionally parameterized convolutions (CondConv), which learn specialized convolutional kernels for each example, and demonstrates that scaling networks with CondConv improves the performance and inference cost trade-off of several existing convolved neural network architectures on both classification and detection tasks. Expand
...
1
2
3
4
...

References

SHOWING 1-10 OF 32 REFERENCES
Convolutional Networks with Adaptive Inference Graphs
TLDR
This work proposes convolutional networks with adaptive inference graphs (ConvNet-AIG) that adaptively define their network topology conditioned on the input image that shows a higher robustness than ResNets, complementing other known defense mechanisms. Expand
Learning Transferable Architectures for Scalable Image Recognition
TLDR
This paper proposes to search for an architectural building block on a small dataset and then transfer the block to a larger dataset and introduces a new regularization technique called ScheduledDropPath that significantly improves generalization in the NASNet models. Expand
Deep Networks with Internal Selective Attention through Feedback Connections
TLDR
DasNet harnesses the power of sequential processing to improve classification performance, by allowing the network to iteratively focus its internal attention on some of its convolutional filters. Expand
Improved Regularization of Convolutional Neural Networks with Cutout
TLDR
This paper shows that the simple regularization technique of randomly masking out square regions of input during training, which is called cutout, can be used to improve the robustness and overall performance of convolutional neural networks. Expand
Squeeze-and-Excitation Networks
  • Jie Hu, Li Shen, Gang Sun
  • Computer Science
  • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
  • 2018
TLDR
This work proposes a novel architectural unit, which is term the "Squeeze-and-Excitation" (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels and finds that SE blocks produce significant performance improvements for existing state-of-the-art deep architectures at minimal additional computational cost. Expand
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
TLDR
This work introduces a Sparsely-Gated Mixture-of-Experts layer (MoE), consisting of up to thousands of feed-forward sub-networks, and applies the MoE to the tasks of language modeling and machine translation, where model capacity is critical for absorbing the vast quantities of knowledge available in the training corpora. Expand
Deep Residual Learning for Image Recognition
TLDR
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. Expand
Densely Connected Convolutional Networks
TLDR
The Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion, and has several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters. Expand
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
Very deep convolutional networks have been central to the largest advances in image recognition performance in recent years. One example is the Inception architecture that has been shown to achieveExpand
Conditional Computation in Neural Networks for faster models
TLDR
This paper applies a policy gradient algorithm for learning policies that optimize this loss function and proposes a regularization mechanism that encourages diversification of the dropout policy and presents encouraging empirical results showing that this approach improves the speed of computation without impacting the quality of the approximation. Expand
...
1
2
3
4
...