Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection

@article{Dwibedi2017CutPA,
  title={Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection},
  author={Debidatta Dwibedi and Ishan Misra and Martial Hebert},
  journal={2017 IEEE International Conference on Computer Vision (ICCV)},
  year={2017},
  pages={1310-1319}
}
A major impediment in rapidly deploying object detection models for instance detection is the lack of large annotated datasets. [] Key Method We automatically ‘cut’ object instances and ‘paste’ them on random backgrounds. A naive way to do this results in pixel artifacts which result in poor performance for trained models. We show how to make detectors ignore these artifacts during training and generate data that gives competitive performance on real data. Our method outperforms existing synthesis approaches…
An Annotation Saved is an Annotation Earned: Using Fully Synthetic Training for Object Instance Detection
TLDR
This work proposes a novel method for creating purely synthetic training data for object detection using a large dataset of 3D background models and densely render them using full domain randomization to enable the training of detectors that outperform models trained with real data on a challenging evaluation dataset.
An Annotation Saved is an Annotation Earned: Using Fully Synthetic Training for Object Detection
TLDR
This work proposes a novel method for creating purely synthetic training data for object detection that enables the training of detectors that compete favorably with models trained on real data while being at least two orders of magnitude more time and cost effective with respect to data annotation.
Images Instances Using context guidance Random instance placement Copy-Paste Data Augmentation New Training Examples
TLDR
This work proposes an explicit context model by using a convolutional neural network, which predicts whether an image region is suitable for placing a given object or not and is able to improve object detection, semantic and instance segmentation on the PASCAL VOC12 and COCO datasets.
Balancing Domain Gap for Object Instance Detection
TLDR
This paper identifies that domain gaps of foreground and background are unbalanced and proposes methods to balance these gaps and helps domain gaps to balance and improve the accuracy of object instance detection in cluttered indoor environment.
Detecting Objects from No-Object Regions: A Context-Based Data Augmentation for Object Detection
TLDR
This work proposes a trainable context model in order to find proper placement regions by classifying and refining dense prior default boxes and designs a corresponding reasonable generation for training examples by annotating ground truth on free space according to the placement rules.
Learning to Detect Every Thing in an Open World
TLDR
A new data augmentation method, BackErase, is developed, which pastes annotated objects on a background image sampled from a small region of the original image to avoid suppressing hidden objects.
Improving generalization with synthetic training data for deep learning based quality inspection
TLDR
This work demonstrates the use of randomly generated synthetic training images can help tackle domain instability issues, making the trained models more robust to contextual changes.
Boosting Instance Segmentation with Synthetic Data: A study to overcome the limits of real world data sets
TLDR
This paper presents a simple approach combining the use of synthetic and real images to boost instance segmentation, and presents the training strategy based on data set mixing, which overcomes the domain shift between real and synthetic data sets.
Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation
TLDR
A systematic study of the Copy-Paste augmentation for instance segmentation where the authors randomly paste objects onto an image finds that the simple mechanism of pasting objects randomly is good enough and can provide solid gains on top of strong baselines.
Gram-SLD: Automatic Self-labeling and Detection for Instance Objects
TLDR
The proposed Gram-SLD can automatically annotate a large amount of data with very limited manually labeled key data and achieve competitive performance in object detection and can satisfy the real-time and accuracy requirements on instance object detection.
...
...

References

SHOWING 1-10 OF 55 REFERENCES
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
TLDR
This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.
Learning Deep Object Detectors from 3D Models
TLDR
This work shows that augmenting the training data of contemporary Deep Convolutional Neural Net (DCNN) models with such synthetic data can be effective, especially when real training data is limited or not well matched to the target domain.
Synthesizing Training Data for Object Detection in Indoor Scenes
TLDR
This work charts new opportunities for training detectors for new objects by exploiting existing object model repositories in either a purely automatic fashion or with only a very small number of human-annotated examples.
SSD: Single Shot MultiBox Detector
TLDR
The approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, which makes SSD easy to train and straightforward to integrate into systems that require a detection component.
The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes
TLDR
This paper generates a synthetic collection of diverse urban images, named SYNTHIA, with automatically generated class annotations, and conducts experiments with DCNNs that show how the inclusion of SYnTHIA in the training stage significantly improves performance on the semantic segmentation task.
You Only Look Once: Unified, Real-Time Object Detection
TLDR
Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.
Microsoft COCO: Common Objects in Context
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene
SceneNet: Understanding Real World Indoor Scenes With Synthetic Data
TLDR
This work focuses its attention on depth based semantic per-pixel labelling as a scene understanding problem and shows the potential of computer graphics to generate virtually unlimited labelled data from synthetic 3D scenes by carefully synthesizing training data with appropriate noise models.
A dataset for developing and benchmarking active vision
TLDR
It is shown that, although increasingly accurate and fast, the state of the art for object detection is still severely impacted by object scale, occlusion, and viewing direction all of which matter for robotics applications.
VirtualWorlds as Proxy for Multi-object Tracking Analysis
TLDR
This work proposes an efficient real-to-virtual world cloning method, and validate the approach by building and publicly releasing a new video dataset, called "Virtual KITTI", automatically labeled with accurate ground truth for object detection, tracking, scene and instance segmentation, depth, and optical flow.
...
...