What Is Holding Back Convnets for Detection?

@inproceedings{Pepik2015WhatIH,
  title={What Is Holding Back Convnets for Detection?},
  author={Bojan Pepik and Rodrigo Benenson and Tobias Ritschel and Bernt Schiele},
  booktitle={GCPR},
  year={2015}
}
Convolutional neural networks have recently shown excellent results in general object detection and many other tasks. Albeit very effective, they involve many user-defined design choices. In this paper we want to better understand these choices by inspecting two key aspects “what did the network learn?”, and “what can the network learn?”. We exploit new annotations (Pascal3D+), to enable a new empirical analysis of the R-CNN detector. Despite common belief, our results indicate that existing… 
Increasing CNN Robustness to Occlusions by Reducing Filter Support
TLDR
This work starts by studying the effect of partial occlusions on the trained CNN and shows, empirically, that training on partially occluded examples reduces the spatial support of the filters, and argues that smaller filter support is beneficial for occlusion robustness.
Domain randomization for neural network classification
TLDR
It is shown that a sufficiently well generated synthetic image dataset can be used to train a neural network classifier that rivals state-of-the-art models trained on real datasets, achieving accuracy levels as high as 88% on a baseline cats vs dogs classification task.
Neural Networks for Cross-Modal Recognition and Vector Object Generation
TLDR
This report discusses how to learn neural networks more inline with the following two intuitions: (i) the same scene can appear very differently and be depicted in different modalities and (ii) complex objects can be explained by simple primitives.
Recent Advances in Object Detection in the Age of Deep Convolutional Neural Networks
TLDR
This article reviews the recent literature on object detection with deep CNN, in a comprehensive way, and provides an in-depth view of these recent advances.
Deep Exemplar 2D-3D Detection by Adapting from Real to Rendered Views
This paper presents an end-to-end convolutional neural network (CNN) for 2D-3D exemplar detection. We demonstrate that the ability to adapt the features of natural images to better align with those
Visual Perception with Synthetic Data
TLDR
A method that reconstructs the surface of objects from a single view in uncalibrated illumination conditions is developed, and a method to speed up the annotation dramatically is developed by recognizing shared resources and automatically propagating annotations across the dataset.
In Search of the Minimal Recognizable Patch
TLDR
To find the minimal recognizable patches, a special neural architecture is designed that identifies the most informative patch and classifies the image based on the information within it, which differs between and within categories, and increase in size for higher required accuracy.
Convolutional Models for Joint Object Categorization and Pose Estimation
TLDR
This paper investigates and analyze the layers of various CNN models and extensively compare between them with the goal of discovering how the layer of distributed representations of CNNs represent object pose information and how this contradicts with object category representations.
On the Robustness of Semantic Segmentation Models to Adversarial Attacks
TLDR
This paper presents what to their knowledge is the first rigorous evaluation of adversarial attacks on modern semantic segmentation models, using two large-scale datasets and shows how mean-field inference in deep structured models and multiscale processing naturally implement recently proposed adversarial defenses.
On the Robustness of Semantic Segmentation Models to Adversarial Attacks
TLDR
This paper presents what to their knowledge is the first rigorous evaluation of adversarial attacks on modern semantic segmentation models, using two large-scale datasets and shows how mean-field inference in deep structured models, multiscale processing and more generally, input transformations naturally implement recently proposed adversarial defenses.
...
...

References

SHOWING 1-10 OF 48 REFERENCES
Return of the Devil in the Details: Delving Deep into Convolutional Nets
TLDR
It is shown that the data augmentation techniques commonly applied to CNN-based methods can also be applied to shallow methods, and result in an analogous performance boost, and it is identified that the dimensionality of the CNN output layer can be reduced significantly without having an adverse effect on performance.
Striving for Simplicity: The All Convolutional Net
TLDR
It is found that max-pooling can simply be replaced by a convolutional layer with increased stride without loss in accuracy on several image recognition benchmarks.
Exploring Invariances in Deep Convolutional Neural Networks Using Synthetic Images
TLDR
This work uses synthetic images to probe DCNN invariance to object-class variations caused by 3D shape, pose, and photorealism, and shows that DCNNs used as a fixed representation exhibit a large amount of invariances to these factors, but, if allowed to adapt, can still learn effectively from synthetic data.
Very Deep Convolutional Networks for Large-Scale Image Recognition
TLDR
This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Persistent Evidence of Local Image Properties in Generic ConvNets
Supervised training of a convolutional network for object classification should make explicit any information related to the class of objects and disregard any auxiliary information associated with
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
TLDR
This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.
Intriguing properties of neural networks
TLDR
It is found that there is no distinction between individual highlevel units and random linear combinations of high level units, according to various methods of unit analysis, and it is suggested that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks.
CNN Features Off-the-Shelf: An Astounding Baseline for Recognition
TLDR
A series of experiments conducted for different recognition tasks using the publicly available code and model of the OverFeat network which was trained to perform object classification on ILSVRC13 suggest that features obtained from deep learning with convolutional nets should be the primary candidate in most visual recognition tasks.
FlowNet: Learning Optical Flow with Convolutional Networks
TLDR
This paper constructs CNNs which are capable of solving the optical flow estimation problem as a supervised learning task, and proposes and compares two architectures: a generic architecture and another one including a layer that correlates feature vectors at different image locations.
Measuring Invariances in Deep Networks
TLDR
A number of empirical tests are proposed that directly measure the degree to which these learned features are invariant to different input transformations and find that stacked autoencoders learn modestly increasingly invariant features with depth when trained on natural images and convolutional deep belief networks learn substantially more invariant Features in each layer.
...
...