StarNet: Joint Action-Space Prediction with Star Graphs and Implicit Global-Frame Self-Attention

  title={StarNet: Joint Action-Space Prediction with Star Graphs and Implicit Global-Frame Self-Attention},
  author={Faris Janjos and Maxim Dolgov and Johann Marius Z{\"o}llner},
  journal={2022 IEEE Intelligent Vehicles Symposium (IV)},
In this work, we present a novel multi-modal multi-agent trajectory prediction architecture, focusing on map and interaction modeling using graph representation. For the purposes of map modeling, we capture rich topological structure into vector-based star graphs, which enable an agent to directly attend to relevant regions along polylines that are used to represent the map. We denote this architecture StarNet, and integrate it into a single-agent prediction setting. As the main result, we… 

Figures and Tables from this paper

DiPA: Probabilistic Multi-Modal Interactive Prediction for Autonomous Driving

DiPA is presented, an interactive predictor that achieves new state-of-the-art performance on the INTERACTION and NGSIM datasets, and improves over the baseline (MFP) when both closest-mode and probabilistic evaluations are used.

Deep Occupancy-Predictive Representations for Autonomous Driving

This work proposes to learn which features are task-relevant in traffic environments and encodes the probabilistic occupancy map as a proxy for obtaining pre-trained state representations, and shows that this approach significantly improves the downstream performance of a reinforcement learning agent operating in urban traffic environments.

Geometric Deep Learning for Autonomous Driving: Unlocking the Power of Graph Neural Networks With CommonRoad-Geometric

This work proposes an easy-to-use and fully customizable data processing pipeline to extract standardized graph datasets from traffic scenarios and provides a platform for GNN-based autonomous driving research, improves comparability between approaches and allows researchers to focus on model implementation instead of dataset curation.

GoRela: Go Relative for Viewpoint-Invariant Motion Forecasting

A pair-wise relative positional encodings are used to represent geometric relationships between the agents and the map elements in a heterogeneous spatial graph to enable diverse and context-aware multimodal prediction.

DiPA: Diverse and Probabilistically Accurate Interactive Prediction

DiPA is presented, a method for producing diverse predictions while also capturing accurate probabilistic estimates, and achieves state-of-the-art performance on INTERACTION and NGSIM, and improves over a baseline (MFP) when both closest-mode and Probabilistic evaluations are used at the same time.



VectorNet: Encoding HD Maps and Agent Dynamics From Vectorized Representation

  • Jiyang GaoChen Sun C. Schmid
  • Computer Science
    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020
VectorNet is introduced, a hierarchical graph neural network that first exploits the spatial locality of individual road components represented by vectors and then models the high-order interactions among all components and obtains state-of-the-art performance on the Argoverse dataset.

INTERACTION Dataset: An INTERnational, Adversarial and Cooperative moTION Dataset in Interactive Driving Scenarios with Semantic Maps

An INTERnational, Adversarial and Cooperative moTION dataset (INTERACTION dataset) in interactive driving scenarios with semantic maps for highly complex behavior such as negotiations, aggressive/irrational decisions and traffic rule violations is presented.

Attention is All you Need

A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

Deep Learning

Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.

Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation

Qualitatively, the proposed RNN Encoder‐Decoder model learns a semantically and syntactically meaningful representation of linguistic phrases.

Self-Supervised Action-Space Prediction for Automated Driving

A novel learned multi-modal trajectory prediction architecture for automated driving achieves kinematically feasible predictions by casting the learning problem into the space of accelerations and steering angles, and introduces the novel Self-Supervised Action-Space Prediction (SSP-ASP) architecture that outputs future environment context features in addition to trajectories.

Scene Transformer: A unified multi-task model for behavior prediction and planning

This work demonstrates that formulating the problem of behavior prediction in a unified architecture with a masking strategy may allow us to have a single model that can perform multiple motion prediction and planning related tasks effectively.

TrafficSim: Learning to Simulate Realistic Multi-Agent Behaviors

This work proposes TrafficSim, a multi-agent behavior model for realistic traffic simulation that generates significantly more realistic traffic scenarios as compared to a diverse set of baselines and can exploit trajectories generated by TrafficSim as effective data augmentation for training better motion planner.

LookOut: Diverse Multi-Future Prediction and Planning for Self-Driving

Lookout, a novel autonomy system that perceives the environment, predicts a diverse set of futures of how the scene might unroll and estimates the trajectory of the SDV by optimizing a set of contingency plans over these future realizations, learns a diverse joint distribution over multi-agent future trajectories in a traffic scene that covers a wide range of future modes with high sample efficiency.

What-If Motion Prediction for Autonomous Driving

This work proposes a recurrent graph-based attentional approach with interpretable geometric and social relationships that supports the injection of counterfactual geometric goals and social contexts that could be used in the planning loop to reason about unobserved causes or unlikely futures that are directly relevant to the AV's intended route.