ELASTIC: Improving CNNs With Dynamic Scaling Policies

@article{Wang2019ELASTICIC,
  title={ELASTIC: Improving CNNs With Dynamic Scaling Policies},
  author={Huiyu Wang and Aniruddha Kembhavi and Ali Farhadi and Alan Loddon Yuille and Mohammad Rastegari},
  journal={2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2019},
  pages={2253-2262}
}
Scale variation has been a challenge from traditional to modern approaches in computer vision. Most solutions to scale issues have a similar theme: a set of intuitive and manually designed policies that are generic and fixed (e.g. SIFT or feature pyramid). We argue that the scaling policy should be learned from data. In this paper, we introduce Elastic, a simple, efficient and yet very effective approach to learn a dynamic scale policy from data. We formulate the scaling policy as a non-linear… Expand
CSPNet: A New Backbone that can Enhance Learning Capability of CNN
TLDR
The proposed CSPNet respects the variability of the gradients by integrating feature maps from the beginning and the end of a network stage, which reduces computations by 20% with equivalent or even superior accuracy on the ImageNet dataset, and significantly outperforms state-of-the-art approaches in terms of AP50 on the MS COCO object detection dataset. Expand
Hard-Attention for Scalable Image Classification
TLDR
A novel architecture is proposed, TNet, which traverses an image pyramid in a top-down fashion, visiting only the most informative regions along the way, and can reduce data acquisition and annotation cost, since it attends only to a fraction of the highest resolution content, while using only image-level labels without bounding boxes. Expand
Learning Scales from Points: A Scale-aware Probabilistic Model for Crowd Counting
TLDR
This paper proposes a density pyramid network (DPN), where each pyramid level handles instances within a particular scale range, and adopts an instance-level probabilistic scale-aware model (IPSM) to guide the multi-scale training of DPN explicitly. Expand
Image Classification Through Top-Down Image Pyramid Traversal
TLDR
This work proposes a novel architecture that traverses an image pyramid in a top-down fashion, while it uses a hard attention mechanism to selectively process only the most informative image parts, and shows that its models can significantly outperform fully convolutional counterparts. Expand
PSConv: Squeezing Feature Pyramid into One Compact Poly-Scale Convolutional Layer
TLDR
The proposed convolution operation, named Poly-Scale Convolution (PSConv), mixes up a spectrum of dilation rates and tactfully allocate them in the individual convolutional kernels of each filter regarding a single convolutionAL layer. Expand
Exploiting weakly supervised visual patterns to learn from partial annotations
TLDR
This paper exploits relationships among images and labels to derive more supervisory signal from the un-annotated labels and can outperform baselines by a margin of 2-10% across all the datasets on mean average precision (mAP) and mean F1 metrics. Expand
Image super-resolution via enhanced multi-scale residual network
TLDR
Experiments on benchmark datasets suggest that the proposed enhanced multi-scale residual network (EMRN) performs favorably over the state-of-the-art methods in reconstructing further superior super-resolution (SR) images. Expand
ULSAM: Ultra-Lightweight Subspace Attention Module for Compact Convolutional Neural Networks
TLDR
This work is the first attempt that uses a subspace attention mechanism to increase the efficiency of compact CNNs, and argues that leaning separate attention maps for each feature subspace enables multi-scale and multi-frequency feature representation, which is more desirable for fine-grained image classification. Expand
Scale-invariant scale-channel networks: Deep networks that generalise to previously unseen scales
TLDR
A formalism for analysing the covariance and invariance properties ofscale channel networks is developed, and how different design choices, unique to scaling transformations, affect the overall performance of scale channel networks are explored. Expand
Scale-covariant and scale-invariant Gaussian derivative networks
TLDR
It is demonstrated that the resulting approach allows for scale generalization, enabling good performance for classifying patterns at scales not present in the training data. Expand
...
1
2
3
...

References

SHOWING 1-10 OF 43 REFERENCES
Feature Pyramid Networks for Object Detection
TLDR
This paper exploits the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost and achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles. Expand
ParseNet: Looking Wider to See Better
TLDR
This work presents a technique for adding global context to deep convolutional networks for semantic segmentation, and achieves state-of-the-art performance on SiftFlow and PASCAL-Context with small additional computational cost over baselines. Expand
Aggregated Residual Transformations for Deep Neural Networks
TLDR
On the ImageNet-1K dataset, it is empirically show that even under the restricted condition of maintaining complexity, increasing cardinality is able to improve classification accuracy and is more effective than going deeper or wider when the authors increase the capacity. Expand
Rethinking the Inception Architecture for Computer Vision
TLDR
This work is exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization. Expand
ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation
TLDR
A fast and efficient convolutional neural network, ESPNet, for semantic segmentation of high resolution images under resource constraints, which outperforms all the current efficient CNN networks such as MobileNet, ShuffleNet, and ENet on both standard metrics and the newly introduced performance metrics that measure efficiency on edge devices. Expand
Fully Convolutional Networks for Semantic Segmentation
TLDR
It is shown that convolutional networks by themselves, trained end- to-end, pixels-to-pixels, improve on the previous best result in semantic segmentation. Expand
Rethinking Atrous Convolution for Semantic Image Segmentation
TLDR
The proposed `DeepLabv3' system significantly improves over the previous DeepLab versions without DenseCRF post-processing and attains comparable performance with other state-of-art models on the PASCAL VOC 2012 semantic image segmentation benchmark. Expand
Multilabel Image Classification With Regional Latent Semantic Dependencies
TLDR
The proposed RLSD achieves the best performance compared to the state-of-the-art models, especially for predicting small objects occurring in the images, and can approach the upper bound without using the bounding-box annotations, which is more realistic in the real world. Expand
Recombinator Networks: Learning Coarse-to-Fine Feature Aggregation
TLDR
This work introduces another model - dubbed Recombinator Networks - where coarse features inform finer features early in their formation such that finer features can make use of several layers of computation in deciding how to use coarse features. Expand
Hypercolumns for object segmentation and fine-grained localization
TLDR
Using hypercolumns as pixel descriptors, this work defines the hypercolumn at a pixel as the vector of activations of all CNN units above that pixel, and shows results on three fine-grained localization tasks: simultaneous detection and segmentation, and keypoint localization. Expand
...
1
2
3
4
5
...