Aggregated Residual Transformations for Deep Neural Networks

@article{Xie2017AggregatedRT,
  title={Aggregated Residual Transformations for Deep Neural Networks},
  author={Saining Xie and Ross B. Girshick and Piotr Doll{\'a}r and Zhuowen Tu and Kaiming He},
  journal={2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2017},
  pages={5987-5995}
}
We present a simple, highly modularized network architecture for image classification. Our network is constructed by repeating a building block that aggregates a set of transformations with the same topology. Our simple design results in a homogeneous, multi-branch architecture that has only a few hyper-parameters to set. This strategy exposes a new dimension, which we call cardinality (the size of the set of transformations), as an essential factor in addition to the dimensions of depth and… 
Learning Strict Identity Mappings in Deep Residual Networks
TLDR
This paper proposes an architecture that allows us to automatically discard redundant layers, which produces responses that are smaller than a threshold ∊, without any loss in performance, and achieves about 80% reduction in the number of parameters.
Learning Transferable Architectures for Scalable Image Recognition
TLDR
This paper proposes to search for an architectural building block on a small dataset and then transfer the block to a larger dataset and introduces a new regularization technique called ScheduledDropPath that significantly improves generalization in the NASNet models.
Gated Convolutional Networks with Hybrid Connectivity for Image Classification
TLDR
Experimental results on CIFAR and ImageNet datasets show that HCGNet is more prominently efficient than DenseNet, and can also significantly outperform state-of-the-art networks with less complexity.
Data-Driven Sparse Structure Selection for Deep Neural Networks
TLDR
A simple and effective framework to learn and prune deep models in an end-to-end manner by adding sparsity regularizations on factors, and solving the optimization problem by a modified stochastic Accelerated Proximal Gradient (APG) method.
Structured Binary Neural Networks for Accurate Image Classification and Semantic Segmentation
TLDR
This paper proposes a ``network decomposition'' strategy, named Group-Net, in which each full-precision group can be effectively reconstructed by aggregating a set of homogeneous binary branches, and shows strong generalization to other tasks.
Batch Normalization with Enhanced Linear Transformation
TLDR
This paper proposes to additionally consider each neuron's neighborhood for calculating the outputs of the linear transformation module of batch normalization, and proves that BNET accelerates the convergence of network training and enhances spatial information by assigning the important neurons with larger weights accordingly.
Rethinking Binary Neural Network for Accurate Image Classification and Semantic Segmentation
TLDR
This paper proposes to train a network with both binary weights and binary activations, designed specifically for mobile devices with limited computation capacity and power consumption, and claims that considering both value and structure approximation should be the future development direction of BNNs.
MultiGrain: a unified image embedding for classes and instances
TLDR
A key component of MultiGrain is a pooling layer that takes advantage of high-resolution images with a network trained at a lower resolution that provides state-of-the-art classification accuracy when fed to a linear classifier.
Squeeze-and-Excitation Networks
TLDR
This work proposes a novel architectural unit, which is term the “Squeeze-and-Excitation” (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels and shows that these blocks can be stacked together to form SENet architectures that generalise extremely effectively across different datasets.
Coded ResNeXt: a network for designing disentangled information paths
TLDR
This work looks at neural network architectures for classification in a more general way and introduces an algorithm which defines before the training the paths of the network through which the per-class information flows, and shows that using this algorithm it can extract a lighter single-purpose binary classifier for a particular class by removing the parameters that do not participate in the predefined information path of that class.
...
...

References

SHOWING 1-10 OF 56 REFERENCES
Understanding Deep Architectures using a Recursive Convolutional Network
TLDR
The notion that adding layers alone increases computational power, within the context of convolutional layers is empirically confirmed and the number of feature maps appears ancillary, and finds most of its benefit through the introduction of more weights.
Going deeper with convolutions
We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
TLDR
This work proposes a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit and derives a robust initialization method that particularly considers the rectifier nonlinearities.
Identity Mappings in Deep Residual Networks
TLDR
The propagation formulations behind the residual building blocks suggest that the forward and backward signals can be directly propagated from one block to any other block, when using identity mappings as the skip connections and after-addition activation.
Deep Residual Learning for Image Recognition
TLDR
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition
TLDR
DeCAF, an open-source implementation of deep convolutional activation features, along with all associated network parameters, are released to enable vision researchers to be able to conduct experimentation with deep representations across a range of visual concept learning paradigms.
Deep Roots: Improving CNN Efficiency with Hierarchical Filter Groups
We propose a new method for creating computationally efficient and compact convolutional neural networks (CNNs) using a novel sparse connection structure that resembles a tree root. This allows a
Rethinking the Inception Architecture for Computer Vision
TLDR
This work is exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.
Network In Network
TLDR
With enhanced local modeling via the micro network, the proposed deep network structure NIN is able to utilize global average pooling over feature maps in the classification layer, which is easier to interpret and less prone to overfitting than traditional fully connected layers.
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
TLDR
Clear empirical evidence that training with residual connections accelerates the training of Inception networks significantly is given and several new streamlined architectures for both residual and non-residual Inception Networks are presented.
...
...