Deep Multi-Task Learning for Joint Localization, Perception, and Prediction

  title={Deep Multi-Task Learning for Joint Localization, Perception, and Prediction},
  author={John Phillips and Julieta Martinez and Ioan Andrei B{\^a}rsan and Sergio Casas and Abbas Sadat and Raquel Urtasun},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
Over the last few years, we have witnessed tremendous progress on many subtasks of autonomous driving including perception, motion forecasting, and motion planning. However, these systems often assume that the car is accurately localized against a high-definition map. In this paper we question this assumption, and investigate the issues that arise in state-of-the-art autonomy stacks under localization error. Based on our observations, we design a system that jointly performs perception… 

Figures and Tables from this paper

3D Object Detection for Autonomous Driving: A Review and New Outlooks
This paper conducts a comprehensive survey of the progress in 3D object detection from the aspects of models and sensory inputs, including LiDAR-based, camera- based, and multi-modal detection approaches and provides an in-depth analysis of the potentials and challenges in each category of methods.
Multi-Task Learning with Multi-query Transformer for Dense Prediction
This work proposes a simpler pipeline named Multi-Query Transformer (MQTrans-former) that is equipped with multiple queries from different tasks to facilitate the reasoning among multiple tasks and simplify the cross task pipeline.
BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving
It is shown that the multi-task BEVerse outperforms existing single-task methods on 3D object detection, semantic map construction, and motion prediction, and with extensive experiments on the nuScenes dataset, it also favors in significantly improved ef ficiency.
On Steering Multi-Annotations per Sample for Multi-Task Learning
Stochastic Task Allocation (STA) is introduced, a mechanism that addresses the issue of optimally learning different tasks simultaneously by a task allocation approach, in which each sample is randomly allocated a subset of tasks.
LatentFormer: Multi-Agent Transformer-Based Interaction Modeling and Trajectory Prediction
LatentFormer, a transformerbased model for predicting future vehicle trajectories, is proposed that leverages a novel technique for modeling interactions among dynamic objects in the scene and achieves state-of-the-art performance and improves upon trajectory metrics by up to 40%.
Multi-Task Neural Processes
The proposed multi-task neural processes derive the function priors in a hierarchical Bayesian inference framework, which enables each task to incorporate the shared knowledge provided by related tasks into its context of the prediction function.
Reimagining an autonomous vehicle
It is argued that a rethink is required, reconsidering the autonomous vehicle problem in the light of the body of knowledge that has been gained since the DARPA challenges, and an alternative vision is presented: a recipe for driving with machine learning, and grand challenges for research in driving.


MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving
This paper presents an approach to joint classification, detection and semantic segmentation using a unified architecture where the encoder is shared amongst the three tasks, and performs extremely well in the challenging KITTI dataset.
VLocNet++: Deep Multitask Learning for Semantic Visual Localization and Odometry
The VLocNet++ architecture is proposed that employs a multitask learning approach to exploit the inter-task relationship between learning semantics, regressing 6-DoF global pose and odometry, for the mutual benefit of each of these tasks.
DSDNet: Deep Structured self-Driving Network
A deep structured energy based model which considers the interactions between actors and produces socially consistent multimodal future predictions, and explicitly exploits the predicted future distributions of actors to plan a safe maneuver by using a structured planning cost.
PnPNet: End-to-End Perception and Prediction With Tracking in the Loop
This work proposes PnPNet, an end-to-end model that takes as input sequential sensor data, and outputs at each time step object tracks and their future trajectories, and shows significant improvements over the state-of-the-art with better occlusion recovery and more accurate future prediction.
Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net
A novel deep neural network that is able to jointly reason about 3D detection, tracking and motion forecasting given data captured by a 3D sensor is proposed, which is very efficient in terms of both memory and computation.
From Coarse to Fine: Robust Hierarchical Localization at Large Scale
HF-Net is proposed, a hierarchical localization approach based on a monolithic CNN that simultaneously predicts local features and global descriptors for accurate 6-DoF localization and sets a new state-of-the-art on two challenging benchmarks for large-scale localization.
End-To-End Interpretable Neural Motion Planner
A holistic model that takes as input raw LIDAR data and a HD map and produces interpretable intermediate representations in the form of 3D detections and their future trajectories, as well as a cost volume defining the goodness of each position that the self-driving car can take within the planning horizon, is designed.
Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics
A principled approach to multi-task deep learning is proposed which weighs multiple loss functions by considering the homoscedastic uncertainty of each task, allowing us to simultaneously learn various quantities with different units or scales in both classification and regression settings.
Perceive, Predict, and Plan: Safe Motion Planning Through Interpretable Semantic Representations
A novel end-to-end learnable network that performs joint perception, prediction and motion planning for self-driving vehicles and produces interpretable intermediate representations that is achieved by a novel differentiable semantic occupancy representation that is explicitly used as cost by the motion planning process.
Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks
This work presents a method to predict multiple possible trajectories of actors while also estimating their probabilities, and successfully tested on SDVs in closed-course tests.