Deformable Convolutional Networks

@article{Dai2017DeformableCN,
  title={Deformable Convolutional Networks},
  author={Jifeng Dai and Haozhi Qi and Yuwen Xiong and Yi Li and Guodong Zhang and Han Hu and Yichen Wei},
  journal={2017 IEEE International Conference on Computer Vision (ICCV)},
  year={2017},
  pages={764-773}
}
Convolutional neural networks (CNNs) are inherently limited to model geometric transformations due to the fixed geometric structures in their building modules. [] Key Method The new modules can readily replace their plain counterparts in existing CNNs and can be easily trained end-to-end by standard back-propagation, giving rise to deformable convolutional networks. Extensive experiments validate the performance of our approach. For the first time, we show that learning dense spatial transformation in deep…

Figures and Tables from this paper

Deformable Faster R-CNN with Aggregating Multi-Layer Features for Partially Occluded Object Detection in Optical Remote Sensing Images
TLDR
A new module named deformable convolution is introduced that is integrated into the prevailing Faster R-CNN that learns the augmenting spatial sampling locations in the modules from target tasks without additional supervision by adding 2D offsets to the regular sampling grid in the standard convolution.
Deformable Convolutional Networks Tracker
TLDR
This work introduced a new module deformable convolution that greatly enhances CNN's ability to model geometric transformations and illustrates outstanding performance in the Visual Tracker Benchmark (OTB)100 benchmark with scale variation and deformation attributes.
Deformable and residual convolutional network for image super-resolution
TLDR
A deformable and residual convolutional network (DefRCN) is developed to augment spatial sampling locations and enhance the transformation modelling capability of CNNs and the proposed upsample block allows the network to directly process low-resolution images, which reduces the computational resource cost.
NeurVPS: Neural Vanishing Point Scanning via Conic Convolution
TLDR
This work identifies a canonical conic space in which the neural network can effectively compute the global geometric information of vanishing points locally, and proposes a novel operator named conic convolution that can be implemented as regular convolutions in this space.
Towards Learning Affine-Invariant Representations via Data-Efficient CNNs
TLDR
A novel multi-scale maxout CNN is proposed and train it end-to-end with a novel rotation-invariant regularizer that aims to enforce the weights in each 2D spatial filter to approximate circular patterns.
Volumetric Transformer Networks
TLDR
This work proposes a loss function defined between the warped features of pairs of instances, which improves the localization ability of VTN and consistently boosts the features' representation power and consequently the networks' accuracy on fine-grained image recognition and instance-level image retrieval.
Improve Crowd Size Estimation by Leveraging Deformable Convolutional Neural Network and Deformable Region of Interest
TLDR
A new model for Crowd Estimation with help of Deformable Convolutional Neural Network (DCN) and Deforming Region of Interest (DRoI) pooling is proposed and results show better accuracy performance, cost effectiveness of the network and robustness of the model.
A Dual-Branch CNN Structure for Deformable Object Detection
TLDR
This work uses dual branch parallel processing to extract the different features of the target area to coordinate the prediction and rebuilds the feature extraction module to enhance the performance of the network.
Deformable ConvNets V2: More Deformable, Better Results
TLDR
This work presents a reformulation of Deformable Convolutional Networks that improves its ability to focus on pertinent image regions, through increased modeling power and stronger training, and guides network training via a proposed feature mimicking scheme that helps the network to learn features that reflect the object focus and classification power of R-CNN features.
...
...

References

SHOWING 1-10 OF 66 REFERENCES
DeepID-Net: Deformable deep convolutional neural networks for object detection
TLDR
The proposed approach improves the mean averaged precision obtained by RCNN, which was the state-of-the-art, from 31% to 50.3% on the ILSVRC2014 detection test set.
Deformable part models are convolutional neural networks
TLDR
This paper shows that a DPM can be formulated as a CNN, thus providing a synthesis of the two ideas and calls the resulting model a DeepPyramid DPM, which is found to significantly outperform DPMs based on histograms of oriented gradients features (HOG) and slightly outperforms a comparable version of the recently introduced R-CNN detection system, while running significantly faster.
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
TLDR
This work equips the networks with another pooling strategy, "spatial pyramid pooling", to eliminate the above requirement, and develops a new network structure, called SPP-net, which can generate a fixed-length representation regardless of image size/scale.
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
TLDR
This work addresses the task of semantic image segmentation with Deep Learning and proposes atrous spatial pyramid pooling (ASPP), which is proposed to robustly segment objects at multiple scales, and improves the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models.
Spatial Transformer Networks
TLDR
This work introduces a new learnable module, the Spatial Transformer, which explicitly allows the spatial manipulation of data within the network, and can be inserted into existing convolutional architectures, giving neural networks the ability to actively spatially transform feature maps.
R-FCN: Object Detection via Region-based Fully Convolutional Networks
TLDR
This work presents region-based, fully convolutional networks for accurate and efficient object detection, and proposes position-sensitive score maps to address a dilemma between translation-invariance in image classification and translation-variance in object detection.
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
TLDR
This work equips the networks with another pooling strategy, “spatial pyramid pooling”, to eliminate the above requirement, and develops a new network structure, called SPP-net, which can generate a fixed-length representation regardless of image size/scale.
Transformation-Invariant Convolutional Jungles
  • D. Laptev, J. Buhmann
  • Computer Science
    2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2015
TLDR
A novel supervised feature learning approach, which efficiently extracts information from constraints to produce interpretable, transformation-invariant features, which boosts the discrimination power of a novel image classification and segmentation method, which is called Transformation-Invariant Convolutional Jungles (TICJ).
TI-POOLING: Transformation-Invariant Pooling for Feature Learning in Convolutional Neural Networks
TLDR
A deep neural network topology that incorporates a simple to implement transformationinvariant pooling operator (TI-POOLING) that is able to efficiently handle prior knowledge on nuisance variations in the data, such as rotation or scale changes is presented.
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
TLDR
This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.
...
...