Unsupervised Domain Adaptation for Monocular 3D Object Detection via Self-Training

@article{Li2022UnsupervisedDA,
  title={Unsupervised Domain Adaptation for Monocular 3D Object Detection via Self-Training},
  author={Zhenyu Li and Zehui Chen and Ang Li and Liangji Fang and Qinhong Jiang and Xianming Liu and Junjun Jiang},
  journal={ArXiv},
  year={2022},
  volume={abs/2204.11590}
}
. Monocular 3D object detection (Mono3D) has achieved un-precedented success with the advent of deep learning techniques and emerging large-scale autonomous driving datasets. However, drastic performance degradation remains an unwell-studied challenge for practical cross-domain deployment as the lack of labels on the target domain. In this paper, we first comprehensively investigate the significant underly-ing factor of the domain gap in Mono3D, where the critical observation is a depth-shift… 
Towards Model Generalization for Monocular 3D Object Detection
TLDR
The 2D-3D geometry-consistent object scaling strategy (GCOS) is proposed to bridge the gap via an instance-level augment and achieves remarkable performance on all evaluated datasets and surpasses the SoTA unsupervised domain adaptation scheme even without utilizing data on the target domain.

References

SHOWING 1-10 OF 46 REFERENCES
Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency
TLDR
A novel and unified framework, Multi-Level Consistency Network (MLC-Net), which employs a teacher-student paradigm to generate adaptive and reliable pseudo-targets and out-performs existing state-of-the-art methods on standard benchmarks.
ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection
TLDR
This work presents a new domain adaptive self-training pipeline, named ST3D, for unsupervised domain adaptation on 3D object detection from point clouds, which achieves state-of-the-art performance on all evaluated datasets and even surpasses fully supervised results on KITTI 3Dobject detection benchmark.
FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection
TLDR
The solution achieves 1st place out of all the vision-only methods in the nuScenes 3D detection challenge of NeurIPS 2020 and proposes a general framework FCOS3D, getting rid of any 2D detection or 2D-3D correspondence priors.
SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation
TLDR
This paper argues that the 2D detection network is redundant and introduces non-negligible noise for 3D detection, and proposes a novel 3D object detection method, named SMOKE, in this paper that predicts a 3D bounding box for each detected object by combining a single keypoint estimate with regressed 3D variables.
ST3D++: Denoised Self-training for Unsupervised Domain Adaptation on 3D Object Detection
TLDR
ST3D++ achieves state-of-the-art performance on all evaluated settings, outperforming the corresponding baseline by a large margin, and even surpasses the fully supervised oracle results on the KITTI 3D object detection benchmark with target prior.
Is Pseudo-Lidar needed for Monocular 3D Object detection?
TLDR
This work proposes an end-to-end, single stage, monocular 3D object detector, DD3D, that can benefit from depth pre-training like pseudo-lidar methods, but without their limitations.
Orthographic Feature Transform for Monocular 3D Object Detection
TLDR
The orthographic feature transform is introduced, which enables us to escape the image domain by mapping image-based features into an orthographic 3D space and allows us to reason holistically about the spatial configuration of the scene in a domain where scale is consistent and distances between objects are meaningful.
Domain Adaptive Faster R-CNN for Object Detection in the Wild
TLDR
This work builds on the recent state-of-the-art Faster R-CNN model, and design two domain adaptation components, on image level and instance level, to reduce the domain discrepancy, based on $$-divergence theory.
M3D-RPN: Monocular 3D Region Proposal Network for Object Detection
  • G. Brazil, Xiaoming Liu
  • Computer Science
    2019 IEEE/CVF International Conference on Computer Vision (ICCV)
  • 2019
TLDR
M3D-RPN is able to significantly improve the performance of both monocular 3D Object Detection and Bird's Eye View tasks within the KITTI urban autonomous driving dataset, while efficiently using a shared multi-class model.
Unsupervised Domain Adaptation for Semantic Segmentation via Class-Balanced Self-training
TLDR
This paper proposes a novel UDA framework based on an iterative self-training (ST) procedure, where the problem is formulated as latent variable loss minimization, and can be solved by alternatively generating pseudo labels on target data and re-training the model with these labels.
...
...