• Corpus ID: 55687967

ELASTIC: Improving CNNs with Instance Specific Scaling Policies

@article{Wang2018ELASTICIC,
  title={ELASTIC: Improving CNNs with Instance Specific Scaling Policies},
  author={Huiyu Wang and Aniruddha Kembhavi and Ali Farhadi and Alan Loddon Yuille and Mohammad Rastegari},
  journal={ArXiv},
  year={2018},
  volume={abs/1812.05262}
}
Scale variation has been a challenge from traditional to modern approaches in computer vision. Most solutions to scale issues have similar theme: a set of intuitive and manually designed policies that are generic and fixed (e.g. SIFT or feature pyramid). We argue that the scale policy should be learned from data. In this paper, we introduce ELASTIC, a simple, efficient and yet very effective approach to learn instance-specific scale policy from data. We formulate the scaling policy as a non… 

Multi-Dimensional Pruning: A Unified Framework for Model Compression

This work proposes a unified model compression framework called Multi-Dimensional Pruning (MDP) to simultaneously compress the convolutional neural networks (CNNs) on multiple dimensions and demonstrates that the MDP framework outperforms the existing methods when pruning both 2D and 3D CNNs.

Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks With Octave Convolution

This work proposes to factorize the mixed feature maps by their frequencies, and design a novel Octave Convolution (OctConv) operation to store and process feature maps that vary spatially “slower” at a lower spatial resolution reducing both memory and computation cost.

DiCENet: Dimension-Wise Convolutions for Efficient Networks

A novel and generic convolutional unit that is built using dimension-wise convolutions anddimension-wise fusion, that shows significant improvements over state-of-the-art models across various computer vision tasks including image classification, object detection, and semantic segmentation.

Exploring Multi-Scale Feature Propagation and Communication for Image Super Resolution

This work presents a unified formulation over widely-used multi-scale structures -- Multi-Scale cross-Scale Share-weights convolution (MS$^3$-Conv), which can achieve better SR performance than the standard convolution with less parameters and computational cost.

⨯ EFuse : E fficient channel Fus ion Operations # Params Notations ∗ Convolution × Element-wise multiplication = Concatenate H : Height W : Width D : Depth n : Kernel size DimConv : Dimension-wise Convolution Weighted Average Channel-wise ∗

  • 2019

Feature Pyramid Networks for Object Detection

This paper exploits the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost and achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles.

Aggregated Residual Transformations for Deep Neural Networks

On the ImageNet-1K dataset, it is empirically show that even under the restricted condition of maintaining complexity, increasing cardinality is able to improve classification accuracy and is more effective than going deeper or wider when the authors increase the capacity.

ParseNet: Looking Wider to See Better

This work presents a technique for adding global context to deep convolutional networks for semantic segmentation, and achieves state-of-the-art performance on SiftFlow and PASCAL-Context with small additional computational cost over baselines.

Multilabel Image Classification With Regional Latent Semantic Dependencies

The proposed RLSD achieves the best performance compared to the state-of-the-art models, especially for predicting small objects occurring in the images, and can approach the upper bound without using the bounding-box annotations, which is more realistic in the real world.

Fully convolutional networks for semantic segmentation

The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.

ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation

A fast and efficient convolutional neural network, ESPNet, for semantic segmentation of high resolution images under resource constraints, which outperforms all the current efficient CNN networks such as MobileNet, ShuffleNet, and ENet on both standard metrics and the newly introduced performance metrics that measure efficiency on edge devices.

Rethinking Atrous Convolution for Semantic Image Segmentation

The proposed `DeepLabv3' system significantly improves over the previous DeepLab versions without DenseCRF post-processing and attains comparable performance with other state-of-art models on the PASCAL VOC 2012 semantic image segmentation benchmark.

Improving Pairwise Ranking for Multi-label Image Classification

  • Y. LiYale SongJiebo Luo
  • Computer Science
    2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2017
A novel loss function for pairwise ranking is proposed, which is smooth everywhere, and a label decision module is incorporated into the model, estimating the optimal confidence thresholds for each visual concept.

Rethinking the Inception Architecture for Computer Vision

This work is exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.

Recombinator Networks: Learning Coarse-to-Fine Feature Aggregation

This work introduces another model - dubbed Recombinator Networks - where coarse features inform finer features early in their formation such that finer features can make use of several layers of computation in deciding how to use coarse features.