Shared Cross-Modal Trajectory Prediction for Autonomous Driving

  title={Shared Cross-Modal Trajectory Prediction for Autonomous Driving},
  author={Chiho Choi},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  • Chiho Choi
  • Published 1 April 2020
  • Computer Science
  • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Predicting future trajectories of traffic agents in highly interactive environments is an essential and challenging problem for the safe operation of autonomous driving systems. On the basis of the fact that self-driving vehicles are equipped with various types of sensors (e.g., LiDAR scanner, RGB camera, radar, etc.), we propose a Cross-Modal Embedding framework that aims to benefit from the use of multiple input modalities. At training time, our model learns to embed a set of complementary… 

Social-SSL: Self-supervised Cross-Sequence Representation Learning Based on Transformers for Multi-agent Trajectory Prediction

This work proposes Social-SSL, a Transformer-based architecture that captures cross-sequence trajectory structures via self-supervised pre-training, which plays a crucial role in improving both data efficiency and generalizability of Transformer networks for trajectory prediction.

InAction: Interpretable Action Decision Making for Autonomous Driving

A novel Interpretable Action decision making model to provide an enriched explanation from both explicit human annotation and implicit visual semantics, which aims to jointly align the human-annotated explanation and action decision making.

PedFormer: Pedestrian Behavior Prediction via Cross-Modal Attention Modulation and Gated Multitask Learning

This work proposes a novel framework that relies on different data modalities to predict future trajectories and crossing actions of pedestrians from an ego-centric perspective and demonstrates that this model improves state-of-the-art in trajectory and action prediction by up to 22% and 13% respectively on various metrics.

Multi-Objective Diverse Human Motion Prediction with Knowledge Distillation

This work designs a prediction frame-work that can balance the accuracy sampling and diversity sampling during the testing phase and proposes a multi-objective conditional variational inference prediction model, which includes a short-term oracle to encourage the prediction framework to explore more diverse future motions.

HDGT: Heterogeneous Driving Graph Transformer for Multi-Agent Trajectory Prediction via Scene Encoding

A novel backbone, namely Heterogeneous Driving Graph Transformer (HDGT), which models the driving scene as a heterogeneous graph with different types of nodes and edges, and achieves new state-of-the-art results on INTERACTION Prediction Challenge and Waymo Open Motion Challenge.

Important Object Identification with Semi-Supervised Learning for Autonomous Driving

This work proposes a novel approach for important object identification in egocentric driving scenarios with relational reasoning on the objects in the scene with a semi-supervised learning pipeline to enable the model to learn from unlimited unlabeled data.

Human-Machine Shared Driving: Challenges and Future Directions

A systematic review of the major studies and developments of human-machine shared driving selected through a thorough and comprehensive search of the literature demonstrates that shared control approaches are mostly dependent on vehicle and environmental data obtained through various sensors.

Autonomous Vehicles: Open-Source Technologies, Considerations, and Development

This paper will introduce the reader to the technologies that build autonomous vehicles, and focus on open-source tools and libraries for autonomous vehicle development, making it cheaper and easier for developers and researchers to participate in the field.

A Review on Scene Prediction for Automated Driving

A quantitative comparison of the model results reveals the dominance of deep learning methods in current state-of-the-art research in this area, leading to a competition on the cm scale.

Trajectory Forecasting Based on Prior-Aware Directed Graph Convolutional Neural Network

This work presents a directed graph convolutional neural network for multiple agents trajectory prediction, and proposes three directed graph topologies, i.e., view graph, direction graph, and rate graph, which endows the capability of the framework to effectively characterize the asymmetric influence between agents.



The H3D Dataset for Full-Surround 3D Multi-Object Detection and Tracking in Crowded Urban Scenes

This work presents the Honda Research Institute 3D Dataset (H3D), a large-scale full-surround 3D multi-object detection and tracking dataset collected using a 3D LiDAR scanner, and proposes a labeling methodology to speed up the overall annotation cycle.

Vision meets robotics: The KITTI dataset

A novel dataset captured from a VW station wagon for use in mobile robotics and autonomous driving research, using a variety of sensor modalities such as high-resolution color and grayscale stereo cameras and a high-precision GPS/IMU inertial navigation system.

DROGON: A Causal Reasoning Framework for Future Trajectory Forecast

This work proposes DROGON (Deep RObust Goal-Oriented trajectory prediction Network) for accurate vehicle trajectory forecast by considering behavioral intention of vehicles in traffic scenes, and builds a conditional prediction model to forecast goal-oriented trajectories.

What the Constant Velocity Model Can Teach Us About Pedestrian Motion Prediction

This work shows how neural networks for pedestrian motion prediction can be thoroughly evaluated and which research directions for neural motion prediction are promising in future and clarifies false assumptions about the problem itself.

DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents

The proposed Deep Stochastic IOC RNN Encoder-decoder framework, DESIRE, for the task of future predictions of multiple interacting agents in dynamic scenes significantly improves the prediction accuracy compared to other baseline methods.

Pyramid Scene Parsing Network

This paper exploits the capability of global context information by different-region-based context aggregation through the pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet) to produce good quality results on the scene parsing task.

Egocentric Vision-based Future Vehicle Localization for Intelligent Driving Assistance Systems

A novel approach to simultaneously predict both the location and scale of target vehicles in the first-person (egocentric) view of an ego-vehicle using a multi-stream recurrent neural network encoder-decoder model that separately captures both object location and Scale and pixel-level observations for future vehicle localization is introduced.

DROGON: A Trajectory Prediction Model based on Intention-Conditioned Behavior Reasoning

The proposed framework for accurate vehicle trajectory prediction by considering behavioral intentions of vehicles in traffic scenes is extended to the pedestrian trajectory prediction task, showing the potential applicability toward general trajectory prediction.

EvolveGraph: Multi-Agent Trajectory Prediction with Dynamic Relational Reasoning

This paper proposes a generic trajectory forecasting framework with explicit relational structure recognition and prediction via latent interaction graphs among multiple heterogeneous, interactive agents and introduces a double-stage training pipeline which not only improves training efficiency and accelerates convergence, but also enhances model performance.

Trajectron++: Dynamically-Feasible Trajectory Forecasting with Heterogeneous Data

Trajectron++ is a modular, graph-structured recurrent model that forecasts the trajectories of a general number of diverse agents while incorporating agent dynamics and heterogeneous data and outperforming a wide array of state-of-the-art deterministic and generative methods.