FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection

  title={FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection},
  author={Tai Wang and Xinge Zhu and Jiangmiao Pang and Dahua Lin},
  journal={2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)},
  • Tai WangXinge Zhu Dahua Lin
  • Published 22 April 2021
  • Computer Science
  • 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)
Monocular 3D object detection is an important task for autonomous driving considering its advantage of low cost. It is much more challenging than conventional 2D cases due to its inherent ill-posed property, which is mainly reflected in the lack of depth information. Recent progress on 2D detection offers opportunities to better solving this problem. However, it is non-trivial to make a general adapted 2D detector work in this 3D task. In this paper, we study this problem with a practice built… 

Figures and Tables from this paper

Towards Model Generalization for Monocular 3D Object Detection

The 2D-3D geometry-consistent object scaling strategy (GCOS) is proposed to bridge the gap via an instance-level augment and achieves remarkable performance on all evaluated datasets and surpasses the SoTA unsupervised domain adaptation scheme even without utilizing data on the target domain.

Learning Auxiliary Monocular Contexts Helps Monocular 3D Object Detection

The key idea is that with the annotated 3D bounding boxes of objects in an image, there is a rich set of well-posed projected 2D supervision signals available in training, such as the projected corner keypoints and their associated offset vectors with respect to the center of 2D bounders, which should be exploited as auxiliary tasks in training.

MonoDistill: Learning Spatial Features for Monocular 3D Object Detection

A simple and effective scheme to introduce the spatial information from LiDAR signals to the monocular 3D detectors, without introducing any extra cost in the inference phase, and can significantly boost the performance of the baseline model.

FCAF3D: Fully Convolutional Anchor-Free 3D Object Detection

This paper presents FCAF3D — a first-in-class fully convolutional anchor-free indoor 3D object detection method that uses a voxel representation of a point cloud and processes voxels with sparse convolutions and proposes a novel parametrization of oriented bounding boxes that allows obtaining better results in a purely data-driven way.

A Simple Baseline for Multi-Camera 3D Object Detection

This paper presents SimMOD, a baseline for monocular 3D object detection with surrounding cameras, built on sample-wise object proposals and designed to work in a two- stage manner, and includes the auxiliary branches alongside the proposal generation to enhance the feature learning.

3D Object Detection from Images for Autonomous Driving: A Survey

This paper provides the first comprehensive survey of this novel and continuously growing research field, summarizing the most commonly used pipelines for image-based 3D detection and deeply analyzing each of their components and proposing two new taxonomies to organize the state-of-the-art methods into different categories.

MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection

This paper enables the vanilla transformer to be depth-aware and enforce the whole detection process guided by depth, which achieves competitive performance on KITTI benchmark among state-of-the-art center-based networks.

Unsupervised Domain Adaptation for Monocular 3D Object Detection via Self-Training

STMono3D , a new self-teaching framework for unsupervised domain adaptation on Mono3D, is proposed and the quality-aware supervision strategy to take instance-level pseudo confidences into account and improve the effectiveness of the target-domain training process is proposed.

BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View

The BEVDet paradigm is contributed, which performs 3D object detection in Bird-Eye-View (BEV), where most target values are defined and route planning can be handily performed and substantially develops its performance by constructing an exclusive data augmentation strategy and upgrading the Non-Maximum Suppression strategy.

PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images

In this paper, we propose PETRv2, a unified framework for 3D perception from multi-view images. Based on PETR [24], PETRv2 explores the effectiveness of temporal modeling, which utilizes the temporal



M3D-RPN: Monocular 3D Region Proposal Network for Object Detection

M3D-RPN is able to significantly improve the performance of both monocular 3D Object Detection and Bird's Eye View tasks within the KITTI urban autonomous driving dataset, while efficiently using a shared multi-class model.

Learning Depth-Guided Convolutions for Monocular 3D Object Detection

  • Mingyu DingYuqi Huo P. Luo
  • Computer Science
    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
  • 2020
D4LCN overcomes the limitation of conventional 2D convolutions and narrows the gap between image representation and 3D point cloud representation, where the filters and their receptive fields can be automatically learned from image-based depth maps.

FCOS: Fully Convolutional One-Stage Object Detection

For the first time, a much simpler and flexible detection framework achieving improved detection accuracy is demonstrated, and it is hoped that the proposed FCOS framework can serve as a simple and strong alternative for many other instance-level tasks.

RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving

This work proposes an efficient and accurate monocular 3D detection framework in single shot that achieves state-of-the-art performance on the KITTI benchmark and predicts the nine perspective keypoints of a 3D bounding box in image space, and utilizes the geometric relationship of 3D and 2D perspectives.

Orthographic Feature Transform for Monocular 3D Object Detection

The orthographic feature transform is introduced, which enables us to escape the image domain by mapping image-based features into an orthographic 3D space and allows us to reason holistically about the spatial configuration of the scene in a domain where scale is consistent and distances between objects are meaningful.

Monocular 3D Object Detection and Box Fitting Trained End-to-End Using Intersection-over-Union Loss

It is argued that the SS3D architecture provides a solid framework upon which high performing detection systems can be built, with autonomous driving being the main application in mind.

MonoPair: Monocular 3D Object Detection Using Pairwise Spatial Relationships

This work proposes a novel method to improve the monocular 3D object detection by considering the relationship of paired samples, which allows us to encode spatial constraints for partially-occluded objects from their adjacent neighbors.

Disentangling Monocular 3D Object Detection

An approach for monocular 3D object detection from a single RGB image, which leverages a novel disentangling transformation for 2D and 3D detection losses and a novel, self-supervised confidence score for 3D bounding boxes is proposed.

Kinematic 3D Object Detection in Monocular Video

This work proposes a novel method for monocular video-based 3D object detection which carefully leverages kinematic motion to improve precision of 3D localization and achieves state-of-the-art performance on monocular 3Dobject detection and the Bird's Eye View tasks within the KITTI self-driving dataset.

Multi-level Fusion Based 3D Object Detection from Monocular Images

  • Bin XuZhenzhong Chen
  • Computer Science
    2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
  • 2018
This paper introduces an end-to-end multi-level fusion based framework for 3D object detection from a single monocular image and demonstrates that the proposed algorithm significantly outperforms monocular state-of-the-art methods.