• Corpus ID: 233346750

Exploring 2D Data Augmentation for 3D Monocular Object Detection

@article{Sugirtha2021Exploring2D,
  title={Exploring 2D Data Augmentation for 3D Monocular Object Detection},
  author={T. Sugirtha and M. Sridevi and Khailash Santhakumar and Bangalore Ravi Kiran and Thomas Gauthier and Senthil Kumar Yogamani},
  journal={ArXiv},
  year={2021},
  volume={abs/2104.10786}
}
Data augmentation is a key component of CNN based image recognition tasks like object detection. However, it is relatively less explored for 3D object detection. Many standard 2D object detection data augmentation techniques do not extend to 3D box. Extension of these data augmentations for 3D object detection requires adaptation of the 3D geometry of the input scene and synthesis of new viewpoints. This requires accurate depth information of the scene which may not be always available. In this… 
1 Citations

Figures and Tables from this paper

Interactive Multimodal Attention Network for Emotion Recognition in Conversation
TLDR
Empirical evaluations on the multimodal benchmark IEMOCAP dataset demonstrate that the IMAN achieves competitive performance compared to the state-of-the-art approaches.

References

SHOWING 1-10 OF 20 REFERENCES
SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation
TLDR
This paper argues that the 2D detection network is redundant and introduces non-negligible noise for 3D detection, and proposes a novel 3D object detection method, named SMOKE, in this paper that predicts a 3D bounding box for each detected object by combining a single keypoint estimate with regressed 3D variables.
GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving
TLDR
This work leverages the off-the-shelf 2D object detector to efficiently obtain a coarse cuboid for each predicted 2D box and explores the 3D structure information of the object by employing the visual features of visible surfaces.
Monocular 3D Detection With Geometric Constraint Embedding and Semi-Supervised Training
TLDR
This work designs a fully convolutional model to predict object keypoints, dimension, and orientation, and combine these with perspective geometry constraints to compute position attributes, and proposes an effective semi-supervised training strategy for settings where labeled training data are scarce.
RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving
TLDR
This work proposes an efficient and accurate monocular 3D detection framework in single shot that achieves state-of-the-art performance on the KITTI benchmark and predicts the nine perspective keypoints of a 3D bounding box in image space, and utilizes the geometric relationship of 3D and 2D perspectives.
Monocular 3D Object Detection for Autonomous Driving
TLDR
This work proposes an energy minimization approach that places object candidates in 3D using the fact that objects should be on the ground-plane, and achieves the best detection performance on the challenging KITTI benchmark, among published monocular competitors.
ROI-10D: Monocular Lifting of 2D Detection to 6D Pose and Metric Shape
TLDR
This work proposes a novel loss formulation by lifting 2D detection, orientation, and scale estimation into 3D space and demonstrates that this approach doubles the AP on the 3D pose metrics on the official test set, defining the new state of the art.
Multi-Modality Cut and Paste for 3D Object Detection
TLDR
This paper presents a new multi-modality augmentation approach, Multi-mOdality Cut and pAste (MoCa), which boosts detection performance by cutting point cloud and imagery patches of ground-truth objects and pasting them into different scenes in a consistent manner while avoiding collision between objects.
Albumentations: fast and flexible image augmentations
TLDR
Albumentations is presented, a fast and flexible open source library for image augmentation with many various image transform operations available that is also an easy-to-use wrapper around other augmentation libraries.
Are we ready for autonomous driving? The KITTI vision benchmark suite
TLDR
The autonomous driving platform is used to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection, revealing that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world.
YOLOv4: Optimal Speed and Accuracy of Object Detection
TLDR
This work uses new features: WRC, CSP, CmBN, SAT, Mish activation, Mosaic data augmentation, C mBN, DropBlock regularization, and CIoU loss, and combine some of them to achieve state-of-the-art results: 43.5% AP for the MS COCO dataset at a realtime speed of ~65 FPS on Tesla V100.
...
...