Learning to Evaluate Perception Models Using Planner-Centric Metrics
@article{Philion2020LearningTE, title={Learning to Evaluate Perception Models Using Planner-Centric Metrics}, author={Jonah Philion and Amlan Kar and Sanja Fidler}, journal={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2020}, pages={14052-14061} }
Variants of accuracy and precision are the gold-standard by which the computer vision community measures progress of perception algorithms. One reason for the ubiquity of these metrics is that they are largely task-agnostic; we in general seek to detect zero false negatives or positives. The downside of these metrics is that, at worst, they penalize all incorrect detections equally without conditioning on the task or scene, and at best, heuristics need to be chosen to ensure that different…
Figures and Tables from this paper
31 Citations
The efficacy of Neural Planning Metrics: A meta-analysis of PKL on nuScenes
- Computer ScienceArXiv
- 2020
A neural planning metric based on the KL divergence of a planner's trajectory and the groundtruth route is used to score all submissions of the nuScenes detection challenge and it is found that while somewhat correlated with mAP, the PKL metric shows different behavior to increased traffic density, ego velocity, road curvature and intersections.
From Evaluation to Verification: Towards Task-oriented Relevance Metrics for Pedestrian Detection in Safety-critical Domains
- Computer Science2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
- 2021
This work considers pedestrian detection as a highly relevant perception task, and it is argued that standard measures such as Intersection over Union (IoU) give insufficient results, mainly because they are insensitive to important physical cues including distance, speed, and direction of motion.
LiDAR Cluster First and Camera Inference Later: A New Perspective Towards Autonomous Driving
- Computer ScienceArXiv
- 2021
This paper presents a new end-to-end pipeline for AV that introduces the concept of LiDAR cluster first and camera inference later to detect and classify objects, and shows that this novel object detection pipeline prioritizes the detection of higher risk objects while simultaneously achieving comparable accuracy and a 25% higher average speed compared to camera inference only.
A Step Towards Efficient Evaluation of Complex Perception Tasks in Simulation
- Computer ScienceArXiv
- 2021
This work proposes an approach that enables efficient large-scale testing using simplified low-fidelity simulators and without the computational cost of executing expensive deep learning models, and designs an efficient surrogate model corresponding to the compute intensive components of the task under test.
M2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Birds-Eye View Representation
- Computer ScienceArXiv
- 2022
M 2 BEV is memory efficient, allowing significantly higher resolution images as input, with faster inference speed, and achieves state-of-the-art results in both 3D object detection and BEV segmentation, with the best single model achieving 42.5 mAP and 57.0 mIoU in these two tasks.
Deep Multi-Task Learning for Joint Localization, Perception, and Prediction
- Computer Science2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2021
A system that jointly performs perception, prediction, and localization is designed, which is able to reuse computation between the three tasks, and is thus able to correct localization errors efficiently.
3D Object Detection for Autonomous Driving: A Review and New Outlooks
- Computer ScienceArXiv
- 2022
This paper conducts a comprehensive survey of the progress in 3D object detection from the aspects of models and sensory inputs, including LiDAR-based, camera- based, and multi-modal detection approaches, and provides an in-depth analysis of the potentials and challenges in each category of methods.
Injecting Planning-Awareness into Prediction and Detection Evaluation
- Computer ScienceIV
- 2022
Experiments on an illustrative simulation as well as real-world autonomous driving data validate that the proposed task-aware metrics are able to account for outcome asymmetry and provide a better estimate of a model’s closed-loop performance.
Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D
- Computer ScienceECCV
- 2020
In pursuit of the goal of learning dense representations for motion planning, it is shown that the representations inferred by the model enable interpretable end-to-end motion planning by "shooting" template trajectories into a bird's-eye-view cost map output by the network.
Quantity over Quality: Training an AV Motion Planner with Large Scale Commodity Vision Data
- Computer ScienceArXiv
- 2022
This work shows it is possible to train a high-performance motion planner using commodity vision data which outperforms planners trained on HD-sensor data for a fraction of the cost, and is the first to demonstrate that this is possible using real-world data.
References
SHOWING 1-10 OF 39 REFERENCES
Monocular 3D Object Detection for Autonomous Driving
- Computer Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
This work proposes an energy minimization approach that places object candidates in 3D using the fact that objects should be on the ground-plane, and achieves the best detection performance on the challenging KITTI benchmark, among published monocular competitors.
End to End Learning for Self-Driving Cars
- Computer ScienceArXiv
- 2016
A convolutional neural network is trained to map raw pixels from a single front-facing camera directly to steering commands and it is argued that this will eventually lead to better performance and smaller systems.
PointPillars: Fast Encoders for Object Detection From Point Clouds
- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019
benchmarks suggest that PointPillars is an appropriate encoding for object detection in point clouds, and proposes a lean downstream network.
Disentangling Monocular 3D Object Detection
- Computer Science2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2019
An approach for monocular 3D object detection from a single RGB image, which leverages a novel disentangling transformation for 2D and 3D detection losses and a novel, self-supervised confidence score for 3D bounding boxes is proposed.
nuScenes: A Multimodal Dataset for Autonomous Driving
- Computer Science, Environmental Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
Robust detection and tracking of objects is crucial for the deployment of autonomous vehicle technology. Image based benchmark datasets have driven development in computer vision tasks such as object…
PIXOR: Real-time 3D Object Detection from Point Clouds
- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
PIXOR is proposed, a proposal-free, single-stage detector that outputs oriented 3D object estimates decoded from pixel-wise neural network predictions that surpasses other state-of-the-art methods notably in terms of Average Precision (AP), while still runs at 10 FPS.
SECOND: Sparsely Embedded Convolutional Detection
- Computer ScienceSensors
- 2018
An improved sparse convolution method for Voxel-based 3D convolutional networks is investigated, which significantly increases the speed of both training and inference and introduces a new form of angle loss regression to improve the orientation estimation performance.
Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection
- Computer ScienceArXiv
- 2019
This report presents the method which wins the nuScenes3D Detection Challenge, and proposes a balanced group-ing head to boost the performance for the categories withsimilar shapes, achieving state-of-the-art detection performance on thenuScenes dataset.
STD: Sparse-to-Dense 3D Object Detector for Point Cloud
- Computer Science2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2019
This work proposes a two-stage 3D object detection framework, named sparse-to-dense 3D Object Detector (STD), and implements a parallel intersection-over-union (IoU) branch to increase awareness of localization accuracy, resulting in further improved performance.
Bag of Freebies for Training Object Detection Neural Networks
- Computer ScienceArXiv
- 2019
This work explores training tweaks that apply to various models including Faster R-CNN and YOLOv3 that can improve up to 5% absolute precision compared to state-of-the-art baselines.