Microsoft COCO: Common Objects in Context

@inproceedings{Lin2014MicrosoftCC,
  title={Microsoft COCO: Common Objects in Context},
  author={Tsung-Yi Lin and Michael Maire and Serge J. Belongie and James Hays and Pietro Perona and Deva Ramanan and Piotr Doll{\'a}r and C. Lawrence Zitnick},
  booktitle={ECCV},
  year={2014}
}
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding. [...] Key Result Finally, we provide baseline performance analysis for bounding box and segmentation detection results using a Deformable Parts Model.Expand
Semantic Understanding of Scenes Through the ADE20K Dataset
TLDR
This work presents a densely annotated dataset ADE20K, which spans diverse annotations of scenes, objects, parts of objects, and in some cases even parts of parts, and shows that the networks trained on this dataset are able to segment a wide variety of scenes and objects.
Best of both worlds: human-machine collaboration for object annotation (preliminary version)
Despite the recent boom in large-scale object detection, the long-standing goal of localizing every object in the image remains elusive. The current best object detectors can accurately detect at
DeepScores-A Dataset for Segmentation, Detection and Classification of Tiny Objects
TLDR
A detailed statistical analysis of the DeepScores dataset is presented, comparing it with other computer vision datasets like PASCAL VOC, SUN, SVHN, ImageNet, MS-COCO, as well as with other OMR datasets.
Shape-aware Instance Segmentation
TLDR
This paper introduces a novel object segment representation based on the distance transform of the object masks, and designs an object mask network (OMN) with a new residual-deconvolution architecture that infers such a representation and decodes it into the final binary object mask.
Pixel Objectness
TLDR
This work proposes an end-to-end learning framework for foreground object segmentation that substantially improves the state-of-the-art on foreground segmentation on the ImageNet and MIT Object Discovery datasets and generalizes well to segment object categories unseen in the foreground maps used for training.
In pixels we trust: From Pixel Labeling to Object Localization and Scene Categorization
TLDR
This paper proposes to tackle the problems of image pixel labeling, object detection and scene classification from a bottom-up perspective, where a semantic segmentation of the scene as input, using the DeepLab architecture, based on the ResNet deep network.
Object Counting and Instance Segmentation With Image-Level Supervision
TLDR
This work is the first to propose image-level supervised density map estimation for common object counting and demonstrate its effectiveness in image- level supervised instance segmentation and outperforms existing methods, including those using instance-level supervision, on both datasets for common objects counting.
Blending the Past and Present of Automatic Image Annotation
Real world images depict varying scenes, actions and multiple objects interacting with each other. We consider the fundamental Computer Vision problem of image annotation, where an image needs to be
Best of both worlds: Human-machine collaboration for object annotation
TLDR
This paper empirically validate the effectiveness of the human-in-the-loop labeling approach on the ILSVRC2014 object detection dataset and seamlessly integrates multiple computer vision models with multiple sources of human input in a Markov Decision Process.
LVIS: A Dataset for Large Vocabulary Instance Segmentation
TLDR
This work introduces LVIS (pronounced ‘el-vis’): a new dataset for Large Vocabulary Instance Segmentation, which has a long tail of categories with few training samples due to the Zipfian distribution of categories in natural images.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 60 REFERENCES
Learning Spatial Context: Using Stuff to Find Things
TLDR
This paper clusters image regions based on their ability to serve as context for the detection of objects and shows that the things and stuff (TAS) context model produces meaningful clusters that are readily interpretable, and helps improve detection ability over state-of-the-art detectors.
LabelMe: A Database and Web-Based Tool for Image Annotation
TLDR
A web-based tool that allows easy image annotation and instant sharing of such annotations is developed and a large dataset that spans many object categories, often containing multiple instances over a wide variety of images is collected.
Semantic object classes in video: A high-definition ground truth database
TLDR
The Cambridge-driving Labeled Video Database (CamVid) is presented as the first collection of videos with object class semantic labels, complete with metadata, and the relevance of the database is evaluated by measuring the performance of an algorithm from each of three distinct domains: multi-class object recognition, pedestrian detection, and label propagation.
Object segmentation by alignment of poselet activations to image contours
TLDR
This paper builds upon the part-based pose-let detector, which can predict masks for numerous parts of an object, and extends poselets to 19 other categories apart from person.
Using Segmentation to Verify Object Hypotheses
  • D. Ramanan
  • Computer Science
    2007 IEEE Conference on Computer Vision and Pattern Recognition
  • 2007
TLDR
An approach for object recognition that combines detection and segmentation within a efficient hypothesize/test framework that leads to significant improvements over established approaches such as ViolaJones and DalalTriggs on a variety of benchmark datasets including the PASCAL challenge, LabelMe, and the INRIAPerson dataset.
80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition
TLDR
For certain classes that are particularly prevalent in the dataset, such as people, this work is able to demonstrate a recognition performance comparable to class-specific Viola-Jones style detectors.
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
TLDR
This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.
Caltech-256 Object Category Dataset
We introduce a challenging set of 256 object categories containing a total of 30607 images. The original Caltech-101 [1] was collected by choosing a set of object categories, downloading examples
SUN database: Large-scale scene recognition from abbey to zoo
TLDR
This paper proposes the extensive Scene UNderstanding (SUN) database that contains 899 categories and 130,519 images and uses 397 well-sampled categories to evaluate numerous state-of-the-art algorithms for scene recognition and establish new bounds of performance.
The Pascal Visual Object Classes (VOC) Challenge
TLDR
The state-of-the-art in evaluated methods for both classification and detection are reviewed, whether the methods are statistically different, what they are learning from the images, and what the methods find easy or confuse.
...
1
2
3
4
5
...