MobileNetV2: Inverted Residuals and Linear Bottlenecks

@article{Sandler2018MobileNetV2IR,
  title={MobileNetV2: Inverted Residuals and Linear Bottlenecks},
  author={Mark Sandler and Andrew G. Howard and Menglong Zhu and Andrey Zhmoginov and Liang-Chieh Chen},
  journal={2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2018},
  pages={4510-4520}
}
In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes. [...] Key Method is based on an inverted residual structure where the shortcut connections are between the thin bottleneck layers. The intermediate expansion layer uses lightweight depthwise convolutions to filter features as a source of non-linearity. Additionally, we find that it is important to…Expand
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
TLDR
MobileBERT is a thin version of BERT_LARGE, while equipped with bottleneck structures and a carefully designed balance between self-attentions and feed-forward networks that can be generically applied to various downstream NLP tasks via simple fine-tuning.
Flattenet: A Simple and Versatile Framework for Dense Pixelwise Prediction
TLDR
This paper introduces a novel Flattening Module to produce high-resolution predictions without either removing any subsampling operations or building a complicated decoder module for Fully Convolutional Network (FCN) based models.
SpotPatch: Parameter-Efficient Transfer Learning for Mobile Object Detection
TLDR
This work proposes an approach simultaneously optimizing for both accuracy and footprint, and presents the first systematic study of parameter-efficient transfer learning techniques on object detection tasks, using a setting similar to the Visual Decathlon.
PydMobileNet: Improved Version of MobileNets with Pyramid Depthwise Separable Convolution
TLDR
An improved version of MobileNet, called Pyramid Mobile Network, which is more flexible in fine-tuning the trade-off between accuracy, latency and model size than MobileNets and evaluated on two highly competitive object recognition benchmark datasets.
MnasNet: Platform-Aware Neural Architecture Search for Mobile
TLDR
An automated mobile neural architecture search (MNAS) approach, which explicitly incorporate model latency into the main objective so that the search can identify a model that achieves a good trade-off between accuracy and latency.
Position-Aware Recalibration Module: Learning From Feature Semantics and Feature Position
TLDR
A PositionAware Recalibration Module (PRM in short) which recalibrates features leveraging both feature semantics and position and can be seamlessly integrated into various base networks and applied to many position-aware visual tasks.
Rethinking BiSeNet For Real-time Semantic Segmentation
TLDR
A novel and efficient structure named Short-Term Dense Concatenate network (STDC network) is proposed by removing structure redundancy by gradually reducing the dimension of feature maps and use the aggregation of them for image representation, which forms the basic module of STDC network.
Learning Dynamic Routing for Semantic Segmentation
  • Yanwei Li, Lin Song, +4 authors Jian Sun
  • Computer Science
    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020
TLDR
A conceptually new method to alleviate the scale variance in semantic representation, named dynamic routing, which generates data-dependent routes, adapting to the scale distribution of each image, and compares with several static architectures, which can be modeled as special cases in the routing space.
X3D: Expanding Architectures for Efficient Video Recognition
  • Christoph Feichtenhofer
  • Computer Science
    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020
TLDR
This paper presents X3D, a family of efficient video networks that progressively expand a tiny 2D image classification architecture along multiple network axes, in space, time, width and depth, finding that networks with high spatiotemporal resolution can perform well, while being extremely light in terms of network width and parameters.
Characterizing signal propagation to close the performance gap in unnormalized ResNets
TLDR
A simple set of analysis tools to characterize signal propagation on the forward pass is proposed, and this technique preserves the signal in networks with ReLU or Swish activation functions by ensuring that the per-channel activation means do not grow with depth.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 49 REFERENCES
Aggregated Residual Transformations for Deep Neural Networks
TLDR
On the ImageNet-1K dataset, it is empirically show that even under the restricted condition of maintaining complexity, increasing cardinality is able to improve classification accuracy and is more effective than going deeper or wider when the authors increase the capacity.
ParseNet: Looking Wider to See Better
TLDR
This work presents a technique for adding global context to deep convolutional networks for semantic segmentation, and achieves state-of-the-art performance on SiftFlow and PASCAL-Context with small additional computational cost over baselines.
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
TLDR
This work addresses the task of semantic image segmentation with Deep Learning and proposes atrous spatial pyramid pooling (ASPP), which is proposed to robustly segment objects at multiple scales, and improves the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models.
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
TLDR
This work introduces two simple global hyper-parameters that efficiently trade off between latency and accuracy and demonstrates the effectiveness of MobileNets across a wide range of applications and use cases including object detection, finegrain classification, face attributes and large scale geo-localization.
Rethinking Atrous Convolution for Semantic Image Segmentation
TLDR
The proposed `DeepLabv3' system significantly improves over the previous DeepLab versions without DenseCRF post-processing and attains comparable performance with other state-of-art models on the PASCAL VOC 2012 semantic image segmentation benchmark.
SSD: Single Shot MultiBox Detector
TLDR
The approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, which makes SSD easy to train and straightforward to integrate into systems that require a detection component.
Learning Transferable Architectures for Scalable Image Recognition
TLDR
This paper proposes to search for an architectural building block on a small dataset and then transfer the block to a larger dataset and introduces a new regularization technique called ScheduledDropPath that significantly improves generalization in the NASNet models.
Deep Pyramidal Residual Networks
TLDR
This research gradually increases the feature map dimension at all units to involve as many locations as possible in the network architecture and proposes a novel residual unit capable of further improving the classification accuracy with the new network architecture.
Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors
TLDR
A unified implementation of the Faster R-CNN, R-FCN and SSD systems is presented and the speed/accuracy trade-off curve created by using alternative feature extractors and varying other critical parameters such as image size within each of these meta-architectures is traced out.
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
Very deep convolutional networks have been central to the largest advances in image recognition performance in recent years. One example is the Inception architecture that has been shown to achieve
...
1
2
3
4
5
...