Microsoft COCO: Common Objects in Context

  title={Microsoft COCO: Common Objects in Context},
  author={Tsung-Yi Lin and Michael Maire and Serge J. Belongie and James Hays and Pietro Perona and Deva Ramanan and Piotr Doll{\'a}r and C. Lawrence Zitnick},
  booktitle={European Conference on Computer Vision},
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding. [] Key Result Finally, we provide baseline performance analysis for bounding box and segmentation detection results using a Deformable Parts Model.

Automatic dataset generation for specific object detection

A method to synthesize object-in-scene images, which can pre-serve the objects’ detailed features without bringing irrelevant information, and shows that in the synthesized image, the boundaries of objects blend very well with the background.

Semantic Understanding of Scenes Through the ADE20K Dataset

This work presents a densely annotated dataset ADE20K, which spans diverse annotations of scenes, objects, parts of objects, and in some cases even parts of parts, and shows that the networks trained on this dataset are able to segment a wide variety of scenes and objects.

Best of both worlds: human-machine collaboration for object annotation (preliminary version)

The proposed image annotation system seamlessly integrates multiple computer vision models with multiple sources of human input in a Markov Decision Process and is able to incorporate novel objects instead of relying on a limited training set.

DeepScores-A Dataset for Segmentation, Detection and Classification of Tiny Objects

A detailed statistical analysis of the DeepScores dataset is presented, comparing it with other computer vision datasets like PASCAL VOC, SUN, SVHN, ImageNet, MS-COCO, as well as with other OMR datasets.

Shape-aware Instance Segmentation

This paper introduces a novel object segment representation based on the distance transform of the object masks, and designs an object mask network (OMN) with a new residual-deconvolution architecture that infers such a representation and decodes it into the final binary object mask.

Pixel Objectness

This work proposes an end-to-end learning framework for foreground object segmentation that substantially improves the state-of-the-art on foreground segmentation on the ImageNet and MIT Object Discovery datasets and generalizes well to segment object categories unseen in the foreground maps used for training.

In pixels we trust: From Pixel Labeling to Object Localization and Scene Categorization

This paper proposes to tackle the problems of image pixel labeling, object detection and scene classification from a bottom-up perspective, where a semantic segmentation of the scene as input, using the DeepLab architecture, based on the ResNet deep network.

Object Counting and Instance Segmentation With Image-Level Supervision

This work is the first to propose image-level supervised density map estimation for common object counting and demonstrate its effectiveness in image- level supervised instance segmentation and outperforms existing methods, including those using instance-level supervision, on both datasets for common objects counting.

Blending the Past and Present of Automatic Image Annotation

This thesis attempts to address the image annotation task by a CNN-RNN framework that jointly models label dependencies in an image while annotating it and proposes a new method to learn multiple label prediction paths.

Best of both worlds: Human-machine collaboration for object annotation

This paper empirically validate the effectiveness of the human-in-the-loop labeling approach on the ILSVRC2014 object detection dataset and seamlessly integrates multiple computer vision models with multiple sources of human input in a Markov Decision Process.



Learning Spatial Context: Using Stuff to Find Things

This paper clusters image regions based on their ability to serve as context for the detection of objects and shows that the things and stuff (TAS) context model produces meaningful clusters that are readily interpretable, and helps improve detection ability over state-of-the-art detectors.

LabelMe: A Database and Web-Based Tool for Image Annotation

A web-based tool that allows easy image annotation and instant sharing of such annotations is developed and a large dataset that spans many object categories, often containing multiple instances over a wide variety of images is collected.

Semantic object classes in video: A high-definition ground truth database

Object segmentation by alignment of poselet activations to image contours

This paper builds upon the part-based pose-let detector, which can predict masks for numerous parts of an object, and extends poselets to 19 other categories apart from person.

Using Segmentation to Verify Object Hypotheses

  • D. Ramanan
  • Computer Science
    2007 IEEE Conference on Computer Vision and Pattern Recognition
  • 2007
An approach for object recognition that combines detection and segmentation within a efficient hypothesize/test framework that leads to significant improvements over established approaches such as ViolaJones and DalalTriggs on a variety of benchmark datasets including the PASCAL challenge, LabelMe, and the INRIAPerson dataset.

80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition

For certain classes that are particularly prevalent in the dataset, such as people, this work is able to demonstrate a recognition performance comparable to class-specific Viola-Jones style detectors.

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.

Caltech-256 Object Category Dataset

A challenging set of 256 object categories containing a total of 30607 images is introduced and the clutter category is used to train an interest detector which rejects uninformative background regions.

SUN database: Large-scale scene recognition from abbey to zoo

This paper proposes the extensive Scene UNderstanding (SUN) database that contains 899 categories and 130,519 images and uses 397 well-sampled categories to evaluate numerous state-of-the-art algorithms for scene recognition and establish new bounds of performance.

The Pascal Visual Object Classes (VOC) Challenge

The state-of-the-art in evaluated methods for both classification and detection are reviewed, whether the methods are statistically different, what they are learning from the images, and what the methods find easy or confuse.