Multimodal Motion Prediction with Stacked Transformers

  title={Multimodal Motion Prediction with Stacked Transformers},
  author={Yicheng Liu and Jinghuai Zhang and Liangji Fang and Qinhong Jiang and Bolei Zhou},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
Predicting multiple plausible future trajectories of the nearby vehicles is crucial for the safety of autonomous driving. Recent motion prediction approaches attempt to achieve such multimodal motion prediction by implicitly regularizing the feature or explicitly generating multiple candidate proposals. However, it remains challenging since the latent features may concentrate on the most frequent mode of the data while the proposal-based methods depend largely on the prior knowledge to generate… 

Figures and Tables from this paper

ATDS: Adaptive Temporal and Dual Spatial Encoders for Motion Forecasting

This paper proposes ATDS, a graph-based framework including an adaptive temporal feature encoder and a dual spatial featureEncoder for motion forecasting, which outperforms other baseline methods in terms of the probability of the best predicted trajectory and prediction accuracy.

Motion Transformer with Global Intention Localization and Local Movement Refinement

Motion TRansformer (MTR) framework is proposed that models motion prediction as the joint optimization of global intention localization and local movement refinement and incorporates spatial intention priors by adopting a small set of learnable motion query pairs.

LTP: Lane-based Trajectory Prediction for Autonomous Driving

A two-stage proposal-based motion forecasting method that exploits the sliced lane seg-ments as fine-grained, shareable, and interpretable propos-als and proposes a variance-based non-maximum suppression strategy to select representative trajectories that ensure the diversity of the final output.

DCMS: Motion Forecasting with Dual Consistency and Multi-Pseudo-Target Supervision

The design of DCMS is the proposed Dual Consistency Constraints that regularize the predicted trajectories under spatial and temporal perturbation during the training stage, and a novel self-ensembling scheme to obtain accurate pseudo targets to model the multi-modality in motion forecasting through supervision with multiple targets explicitly, namely Multi-Pseudo-Target supervision.

Jointly Learning Agent and Lane Information for Multimodal Trajectory Prediction

This paper uses lane as scene data and proposes a novel network that Jointly learns Agent and Lane information for Multimodal Trajectory Prediction (JAL-MTP), which could predict accurate and reasonable multimodal trajectories from both quantitative and qualitative perspectives.

Exploring Map-based Features for Efficient Attention-based Vehicle Motion Prediction

This work explores how to achieve competitive performance on the Argoverse 1.0 Benchmark using efficient attention-based models, which take as input the past trajectories and map-based features from minimal map information to ensure efficient and reliable MP.

Motion Inspired Unsupervised Perception and Prediction in Autonomous Driving

3D detection experiments on the Waymo Open Dataset show that the proposed framework outperforms classical unsupervised approaches and is even competitive to the counterpart with supervised scene supervision, and generates highly promising results in open-set 3D detection and trajectory prediction.

LatentFormer: Multi-Agent Transformer-Based Interaction Modeling and Trajectory Prediction

LatentFormer, a transformerbased model for predicting future vehicle trajectories, is proposed that leverages a novel technique for modeling interactions among dynamic objects in the scene and achieves state-of-the-art performance and improves upon trajectory metrics by up to 40%.

HiVT: Hierarchical Vector Transformer for Multi-Agent Motion Prediction

Hierarchical Vector Transformer is proposed for fast and accurate multi-agent motion prediction by decomposing the problem into local con-text extraction and global interaction modeling, which can effectively and efficiently model a large number of agents in the scene.

Diverse Human Motion Prediction Guided by Multi-level Spatial-Temporal Anchors

An interaction-enhanced spatial-temporal graph convolutional network (IE-STGCN) that encodes prior knowledge of human motions, and incorporate the anchors into it that outperforms state of the art in both stochastic and deterministic prediction.



TPNet: Trajectory Proposal Network for Motion Prediction

This work proposes a novel two-stage motion prediction framework, Trajectory Proposal Network (TPNet), which first generates a candidate set of future trajectories as hypothesis proposals, then makes the final predictions by classifying and refining the proposals which meets the physical constraints.

Convolutional Social Pooling for Vehicle Trajectory Prediction

  • Nachiket DeoM. Trivedi
  • Computer Science
    2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
  • 2018
This paper proposes an LSTM encoder-decoder model that uses convolutional social pooling as an improvement to social Pooling layers for robustly learning interdependencies in vehicle motion and outputs a multi-modal predictive distribution over future trajectories based on maneuver classes.

Social LSTM: Human Trajectory Prediction in Crowded Spaces

This work proposes an LSTM model which can learn general human movement and predict their future trajectories and outperforms state-of-the-art methods on some of these datasets.

DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents

The proposed Deep Stochastic IOC RNN Encoder-decoder framework, DESIRE, for the task of future predictions of multiple interacting agents in dynamic scenes significantly improves the prediction accuracy compared to other baseline methods.

Multiple Futures Prediction

A probabilistic framework that efficiently learns latent variables to jointly model the multi-step future motions of agents in a scene and can be used for planning via computing a conditional probability density over the trajectories of other agents given a hypothetical rollout of the ego agent.

Transformer Networks for Trajectory Forecasting

This work considers both the original Transformer Network (TF) and the larger Bidirectional Transformer (BERT), state-of-the-art on all natural language processing tasks, and proposes the novel use of Transformer Networks for trajectory forecasting.

Rules of the Road: Predicting Driving Behavior With a Convolutional Model of Semantic Interactions

A unified representation is presented which encodes such high-level semantic information in a spatial grid, allowing the use of deep convolutional models to fuse complex scene context and empirically show that one can effectively learn fundamentals of driving behavior.

TNT: Target-driveN Trajectory Prediction

The key insight is that for prediction within a moderate time horizon, the future modes can be effectively captured by a set of target states, which leads to the target-driven trajectory prediction (TNT) framework.

VectorNet: Encoding HD Maps and Agent Dynamics From Vectorized Representation

  • Jiyang GaoChen Sun C. Schmid
  • Computer Science
    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020
VectorNet is introduced, a hierarchical graph neural network that first exploits the spatial locality of individual road components represented by vectors and then models the high-order interactions among all components and obtains state-of-the-art performance on the Argoverse dataset.

What-If Motion Prediction for Autonomous Driving

This work proposes a recurrent graph-based attentional approach with interpretable geometric and social relationships that supports the injection of counterfactual geometric goals and social contexts that could be used in the planning loop to reason about unobserved causes or unlikely futures that are directly relevant to the AV's intended route.