• Corpus ID: 55687967

ELASTIC: Improving CNNs with Instance Specific Scaling Policies

@article{Wang2018ELASTICIC,
  title={ELASTIC: Improving CNNs with Instance Specific Scaling Policies},
  author={Huiyu Wang and Aniruddha Kembhavi and Ali Farhadi and Alan Loddon Yuille and Mohammad Rastegari},
  journal={ArXiv},
  year={2018},
  volume={abs/1812.05262}
}
Scale variation has been a challenge from traditional to modern approaches in computer vision. Most solutions to scale issues have similar theme: a set of intuitive and manually designed policies that are generic and fixed (e.g. SIFT or feature pyramid). We argue that the scale policy should be learned from data. In this paper, we introduce ELASTIC, a simple, efficient and yet very effective approach to learn instance-specific scale policy from data. We formulate the scaling policy as a non… 
Multi-Dimensional Pruning: A Unified Framework for Model Compression
TLDR
This work proposes a unified model compression framework called Multi-Dimensional Pruning (MDP) to simultaneously compress the convolutional neural networks (CNNs) on multiple dimensions and demonstrates that the MDP framework outperforms the existing methods when pruning both 2D and 3D CNNs.
Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks With Octave Convolution
TLDR
This work proposes to factorize the mixed feature maps by their frequencies, and design a novel Octave Convolution (OctConv) operation to store and process feature maps that vary spatially “slower” at a lower spatial resolution reducing both memory and computation cost.
DiCENet: Dimension-Wise Convolutions for Efficient Networks
TLDR
A novel and generic convolutional unit that is built using dimension-wise convolutions anddimension-wise fusion, that shows significant improvements over state-of-the-art models across various computer vision tasks including image classification, object detection, and semantic segmentation.
Exploring Multi-Scale Feature Propagation and Communication for Image Super Resolution
TLDR
This work presents a unified formulation over widely-used multi-scale structures -- Multi-Scale cross-Scale Share-weights convolution (MS$^3$-Conv), which can achieve better SR performance than the standard convolution with less parameters and computational cost.

References

SHOWING 1-10 OF 40 REFERENCES
Feature Pyramid Networks for Object Detection
TLDR
This paper exploits the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost and achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles.
Aggregated Residual Transformations for Deep Neural Networks
TLDR
On the ImageNet-1K dataset, it is empirically show that even under the restricted condition of maintaining complexity, increasing cardinality is able to improve classification accuracy and is more effective than going deeper or wider when the authors increase the capacity.
ParseNet: Looking Wider to See Better
TLDR
This work presents a technique for adding global context to deep convolutional networks for semantic segmentation, and achieves state-of-the-art performance on SiftFlow and PASCAL-Context with small additional computational cost over baselines.
Multilabel Image Classification With Regional Latent Semantic Dependencies
TLDR
The proposed RLSD achieves the best performance compared to the state-of-the-art models, especially for predicting small objects occurring in the images, and can approach the upper bound without using the bounding-box annotations, which is more realistic in the real world.
ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation
TLDR
A fast and efficient convolutional neural network, ESPNet, for semantic segmentation of high resolution images under resource constraints, which outperforms all the current efficient CNN networks such as MobileNet, ShuffleNet, and ENet on both standard metrics and the newly introduced performance metrics that measure efficiency on edge devices.
Rethinking Atrous Convolution for Semantic Image Segmentation
TLDR
The proposed `DeepLabv3' system significantly improves over the previous DeepLab versions without DenseCRF post-processing and attains comparable performance with other state-of-art models on the PASCAL VOC 2012 semantic image segmentation benchmark.
Improving Pairwise Ranking for Multi-label Image Classification
  • Y. Li, Yale Song, Jiebo Luo
  • Computer Science
    2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2017
TLDR
A novel loss function for pairwise ranking is proposed, which is smooth everywhere, and a label decision module is incorporated into the model, estimating the optimal confidence thresholds for each visual concept.
Rethinking the Inception Architecture for Computer Vision
TLDR
This work is exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.
Recombinator Networks: Learning Coarse-to-Fine Feature Aggregation
TLDR
This work introduces another model - dubbed Recombinator Networks - where coarse features inform finer features early in their formation such that finer features can make use of several layers of computation in deciding how to use coarse features.
Hypercolumns for object segmentation and fine-grained localization
TLDR
Using hypercolumns as pixel descriptors, this work defines the hypercolumn at a pixel as the vector of activations of all CNN units above that pixel, and shows results on three fine-grained localization tasks: simultaneous detection and segmentation, and keypoint localization.
...
1
2
3
4
...