Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe
@article{Li2022DelvingIT, title={Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe}, author={Hongyang Li and Chonghao Sima and Jifeng Dai and Wenhai Wang and Lewei Lu and Huijie Wang and Enze Xie and Zhiqi Li and Hanming Deng and Haonan Tian and Xizhou Zhu and Li Chen and Yulu Gao and Xiangwei Geng and Jianqiang Zeng and Yang Li and Jiazhi Yang and Xiaosong Jia and Bo Yu and Y. Qiao and Dahua Lin and Siqian Liu and Junchi Yan and Jianping Shi and Ping Luo}, journal={ArXiv}, year={2022}, volume={abs/2209.05324} }
Learning powerful representations in bird's-eye-view (BEV) for perception tasks is trending and drawing extensive attention both from industry and academia. Conventional approaches for most autonomous driving algorithms perform detection, segmentation, tracking, etc., in a front or perspective view. As sensor configurations get more complex, integrating multi-source information from different sensors and representing features in a unified view come of vital importance. BEV perception inherits…
Figures and Tables from this paper
14 Citations
Geometric-aware Pretraining for Vision-centric 3D Object Detection
- Computer ScienceArXiv
- 2023
This work proposes a novel geometric-aware pretraining framework called GAPretrain, which incorporates spatial and structural cues to camera networks by employing the geometric-rich modality as guidance during the pretraining phase and serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors.
Occ-BEV: Multi-Camera Unified Pre-training via 3D Scene Reconstruction
- Computer Science
- 2023
A novel multi-camera unified pre-training framework called Occ-BEV, which involves initially reconstructing the 3D scene as the foundational stage and subsequently fine-tuning the model on downstream tasks, demonstrates promising results in key tasks such as multi- camera 3D object detection and semantic scene completion.
DeepSTEP - Deep Learning-Based Spatio-Temporal End-To-End Perception for Autonomous Vehicles
- Computer ScienceArXiv
- 2023
The concept for an end-to-end perception architecture that combines detection and localization into a single pipeline allows for efficient processing to reduce computational overhead and further improves overall performance, and is a promising solution for real-world deployment.
Think Twice before Driving: Towards Scalable Decoders for End-to-End Autonomous Driving
- Computer ScienceArXiv
- 2023
The above refinement module could be stacked in a cascaded fashion, which extends the capacity of the decoder with spatial-temporal prior knowledge about the conditioned future, and achieves state-of-the-art performance in closed-loop benchmarks.
Bi-Mapper: Holistic BEV Semantic Mapping for Autonomous Driving
- Computer ScienceArXiv
- 2023
A Bi-Mapper framework for top-down road-scene semantic understanding, which incorporates a global view and local prior knowledge and an asynchronous mutual learning strategy is proposed to enhance reliable interaction between them.
Fusion is Not Enough: Single-Modal Attacks to Compromise Fusion Models in Autonomous Driving
- Computer ScienceArXiv
- 2023
It is argued that the weakest link of fusion models depends on their most vulnerable modality, and an attack framework that targets advanced camera-LiDAR fusion models with adversarial patches is proposed, demonstrating the effectiveness and practicality of the proposed attack framework.
Road Genome: A Topology Reasoning Benchmark for Scene Understanding in Autonomous Driving
- Computer ScienceArXiv
- 2023
The goal of Road Genome is to understand the scene structure by investigating the relationship of perceived entities among traffic elements and lanes by introducing OpenLane-V2, the newly minted benchmark.
Sparse Dense Fusion for 3D Object Detection
- Computer ScienceArXiv
- 2023
Sparse Dense Fusion (SDF), a complementary framework that incorporates both sparse-fusion and dense-fusions modules via the Transformer architecture, is proposed, a simple yet effective sparse-dense fusion structure that enriches semantic texture and exploits spatial structure information simultaneously.
Understanding the Robustness of 3D Object Detection with Bird's-Eye-View Representations in Autonomous Driving
- Computer Science
- 2023
This paper evaluates the natural and adversarial robustness of various representative models under extensive settings to fully understand their behaviors influenced by explicit BEV features compared with those without BEV, and proposes a 3D consistent patch attack by applying adversarial patches in the 3D space to guarantee the spatiotemporal consistency.
Benchmarking Robustness of 3D Object Detection to Common Corruptions in Autonomous Driving
- Computer ScienceArXiv
- 2023
This paper designs 27 types of common corruptions for both LiDAR and camera inputs considering real-world driving scenarios and conducts large-scale experiments on 24 diverse 3D object detection models to evaluate their corruption robustness, drawing several important findings.
220 References
Scalability in Perception for Autonomous Driving: Waymo Open Dataset
- Computer Science, Environmental Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
This work introduces a new large scale, high quality, diverse dataset, consisting of well synchronized and calibrated high quality LiDAR and camera data captured across a range of urban and suburban geographies, and studies the effects of dataset size and generalization across geographies on 3D detection methods.
nuScenes: A Multimodal Dataset for Autonomous Driving
- Computer Science, Environmental Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
Robust detection and tracking of objects is crucial for the deployment of autonomous vehicle technology. Image based benchmark datasets have driven development in computer vision tasks such as object…
Are we ready for autonomous driving? The KITTI vision benchmark suite
- Computer Science2012 IEEE Conference on Computer Vision and Pattern Recognition
- 2012
The autonomous driving platform is used to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection, revealing that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world.
BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework
- Computer ScienceNeurIPS
- 2022
This work proposes a surprisingly simple yet novel fusion framework, dubbed BEVFusion, whose camera stream does not depend on the input of LiDAR data, thus addressing the downside of previous methods and is the first to handle realistic LiDar malfunction and can be deployed to realistic scenarios without any post-processing procedure.
Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting
- Computer Science, Environmental ScienceNeurIPS Datasets and Benchmarks
- 2021
The Argoverse 2 (AV2) - a collection of three datasets for perception and forecasting research in the self-driving domain that supports self-supervised learning and the emerging task of point cloud forecasting is introduced.
Weighted boxes fusion: Ensembling boxes from different object detection models
- Environmental ScienceImage Vis. Comput.
- 2021
Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D
- Computer ScienceECCV
- 2020
In pursuit of the goal of learning dense representations for motion planning, it is shown that the representations inferred by the model enable interpretable end-to-end motion planning by "shooting" template trajectories into a bird's-eye-view cost map output by the network.
Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution
- Computer ScienceECCV
- 2020
This work proposes Sparse Point-Voxel Convolution (SPVConv), a lightweight 3D module that equips the vanilla Sparse Convolution with the high-resolution point-based branch, and presents 3D Neural Architecture Search (3D-NAS) to search the optimal network architecture over this diverse design space efficiently and effectively.
One Thousand and One Hours: Self-driving Motion Prediction Dataset
- Computer ScienceCoRL
- 2020
This collection was collected by a fleet of 20 autonomous vehicles along a fixed route in Palo Alto, California over a four-month period and forms the largest, most complete and detailed dataset to date for the development of self-driving, machine learning tasks such as motion forecasting, planning and simulation.
Inverse perspective mapping simplifies optical flow computation and obstacle detection
- MathematicsBiological Cybernetics
- 2004
It turns out that besides obstacle detection, inverse perspective mapping has additional advantages for regularizing optical flow algorithms.