• Corpus ID: 12626239

Mid-level Elements for Object Detection

  title={Mid-level Elements for Object Detection},
  author={Aayush Bansal and Abhinav Shrivastava and Carl Doersch and Abhinav Kumar Gupta},
Building on the success of recent discriminative mid-level elements, we propose a surprisingly simple approach for object detection which performs comparable to the current state-of-the-art approaches on PASCAL VOC comp-3 detection challenge (no external data). Through extensive experiments and ablation analysis, we show how our approach effectively improves upon the HOG-based pipelines by adding an intermediate mid-level representation for the task of object detection. This representation is… 

Figures and Tables from this paper

PixelNet: Representation of the pixels, by the pixels, and for the pixels

It is demonstrated that stratified sampling of pixels allows one to add diversity during batch updates, speeding up learning and efficiently train state-of-the-art models tabula rasa (i.e., "from scratch") for diverse pixel-labeling tasks.

Multi-scale Patch Aggregation (MPA) for Simultaneous Detection and Segmentation

A unified trainable network on patches is designed, which is followed by a fast and effective patch aggregation algorithm to infer object instances via mid-level patches, which benefits from end-to-end training.

DeepCAMP: Deep Convolutional Action & Attribute Mid-Level Patterns

A novel convolutional neural network that mines mid-level image patches that are sufficiently dedicated to resolve the corresponding subtleties and train a newly designed CNN (DeepPattern) that learns discriminative patch groups.

3D attention-driven depth acquisition for object identification

A 3D Attention Model that selects the best views to scan from, as well as the most informative regions in each view to focus on, to achieve efficient object recognition is developed, which leads to focus-driven features which are quite robust against object occlusion.

Mining Mid-level Visual Patterns with Deep CNN Activations

It is shown that Convolutional Neural Network activations extracted from image patches typical possess two appealing properties that enable seamless integration with pattern mining techniques, and it is demonstrated that this approach outperforms or matches the performance of the state-of-the-arts on these tasks.

Mining Mid-Level Visual Elements for Object Detection in High-Resolution Remote Sensing Images

This work proposes a novel and effective HRS image object detection method based on mid-level visual element representations that is compared with several state-of-the-art BOW-based and part-based models.

Revisiting Visual Pattern Mining

This work proposes an unsupervised pattern mining algorithm which works very well given a large unlabelled dataset and extends it to show how it also adapts to include labelled data as well and thus, is able to extract information from both labelled and unlabelling data together.

Supervision Beyond Manual Annotations for Learning Visual Representations

This thesis shows that one can often train a representation which captures similarity beyond what is labeled in a given dataset, and proposes to using pretext tasks: tasks that are not useful in and of themselves, but serve as an excuse to learn a more general-purpose representation.

Semi-Supervised Co-Analysis of 3D Shape Styles from Projected Lines

We present a semi-supervised co-analysis method for learning 3D shape styles from projected feature lines, achieving style patch localization with only weak supervision. Given a collection of 3D

Head and Body Orientation Estimation Using Convolutional Random Projection Forests

A convolutional random projection forest (CRPforest) algorithm that estimates head and body pose well on benchmark datasets and performs favorably against the state-of-the-art methods in low-resolution images with noise, occlusion, and motion blur.



Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.

Bottom-Up Segmentation for Top-Down Detection

A novel deformable part-based model which exploits region-based segmentation algorithms that compute candidate object regions by bottom-up clustering followed by ranking of those regions that outperform the previous state-of-the-art on VOC 2010 test by 4%.

Mid-level Visual Element Discovery as Discriminative Mode Seeking

Given a weakly-labeled image collection, this method discovers visually-coherent patch clusters that are maximally discriminative with respect to the labels, and proposes the Purity-Coverage plot as a principled way of experimentally analyzing and evaluating different visual discovery approaches.

Learning Collections of Part Models for Object Recognition

The detection system is competitive with the best-existing systems, outperforming other HOG-based detectors on the more deformable categories and evaluating the part detectors' ability to discriminate and localize annotated key points on PASCAL VOC 2010.

Segmentation Driven Object Detection with Fisher Vectors

A method to produce tentative object segmentation masks to suppress background clutter in the features to improve object detection significantly and exploit contextual features in the form of a full-image FV descriptor, and an inter-category rescoring mechanism.

Object Detection Using Strongly-Supervised Deformable Part Models

This method is able to deal with sub-optimal and incomplete annotations of object parts and is shown to benefit from semi-supervised learning setups where part-level annotation is provided for a fraction of positive examples only.

Regionlets for Generic Object Detection

This work proposes to model an object class by a cascaded boosting classifier which integrates various types of features from competing local regions, named as region lets, which significantly outperforms the state-of-the-art on popular multi-class detection benchmark datasets with a single method.

Deep Neural Networks for Object Detection

This paper presents a simple and yet powerful formulation of object detection as a regression problem to object bounding box masks, and defines a multi-scale inference procedure which is able to produce high-resolution object detections at a low cost by a few network applications.

The Pascal Visual Object Classes (VOC) Challenge

The state-of-the-art in evaluated methods for both classification and detection are reviewed, whether the methods are statistically different, what they are learning from the images, and what the methods find easy or confuse.

Building Part-Based Object Detectors via 3D Geometry

A joint geometric and appearance based representation not only allows the authors to achieve state-of-the-art results on object detection but also allows them to tackle the grand challenge of understanding 3D objects from 2D images.