TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild

  title={TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild},
  author={Vida Adeli and Mahsa Ehsanpour and Ian D. Reid and Juan Carlos Niebles and Silvio Savarese and Ehsan Adeli and Hamid Rezatofighi},
  journal={2021 IEEE/CVF International Conference on Computer Vision (ICCV)},
Joint forecasting of human trajectory and pose dynamics is a fundamental building block of various applications ranging from robotics and autonomous driving to surveillance systems. Predicting body dynamics requires capturing subtle information embedded in the humans’ interactions with each other and with the objects present in the scene. In this paper, we propose a novel TRajectory and POse Dynamics (nicknamed TRiPOD) method based on graph attentional networks to model the human-human and… 

Figures and Tables from this paper

Learning Decoupled Representations for Human Pose Forecasting
This work proposes to learn decoupled representations for the global and local pose forecasting tasks and shows that it is better to stop the prediction when the uncertainty in human motion increases.
Simple Baseline for Single Human Motion Forecasting
This paper establishes a simple but effective baseline for single human motion forecasting without visual and social information, equipped with useful training tricks, and outperforms existing methods by a large margin on SoMoF benchmark1.
Deep variational learning for multiple trajectory prediction of 360° head movements
This article presents an approach to generate multiple plausible futures of head motion in 360° videos, given a common past trajectory, and designs a training procedure to obtain a flexible and lightweight stochastic prediction model compatible with sequence-to-sequence recurrent neural architectures.
HiT-DVAE: Human Motion Generation via Hierarchical Transformer Dynamical VAE
This paper proposes Hierarchical Transformer Dynamical Variational Variational Autoencoder, HiT-DVAE, which implements auto-regressive generation with transformer-like attention mechanisms, thus enabling the generative model to learn a more complex and time-varying latent space as well as diverse and realistic human motions.
ActFormer: A GAN Transformer Framework towards General Action-Conditioned 3D Human Motion Generation
This work presents a GAN Transformer framework for general action-conditioned 3D human motion generation, including not only single-person actions but also multi-person interactive actions, and demonstrates adaptability to various human motion representations.
Comparison of Spatio-Temporal Models for Human Motion and Pose Forecasting in Face-to-Face Interaction Scenarios
This work presents the first systematic comparison of state-of-the-art approaches for behavior forecasting by autoregressively predicting the future with methods trained for the short-term future and shows that this finding holds when highly noisy annotations are used, which opens new horizons towards the use of weakly-supervised learning.
Didn't see that coming: a survey on non-verbal social human behavior forecasting
This survey defines the behavior forecasting problem for multiple interactive agents in a generic way that aims at unifying the fields of social signals prediction and human motion forecasting, traditionally separated.
ChaLearn LAP Challenges on Self-Reported Personality Recognition and Non-Verbal Behavior Forecasting During Social Dyadic Interactions: Dataset, Design, and Results
This paper summarizes the 2021 ChaLearn Looking at People Challenge on Understanding Social Behavior in Dyadic and Small Group Interactions (DYAD), which featured two tracks, self-reported
Towards Human Pose Prediction using the Encoder-Decoder LSTM
This work modifications one of the previously-used architectures of bounding box prediction to do a harder task of pose prediction in the SoMoF challenge and shows the effectiveness of the proposed method in evaluation metrics.
Multi-Person 3D Motion Prediction with Multi-Range Transformers
A Multi-Range Transformers model which contains of a local-range encoder for individual motion and a global-rangeEncoder for social interactions, which outperforms state-of-the-art methods on long-term 3D motion prediction and generates diverse social interactions.


Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments
We introduce a new dataset, Human3.6M, of 3.6 Million accurate 3D Human poses, acquired by recording the performance of 5 female and 6 male subjects, under 4 different viewpoints, for training
History Repeats Itself: Human Motion Prediction via Motion Attention
An attention-based feed-forward network is introduced that explicitly leverages the observation that human motion tends to repeat itself to capture motion attention to capture the similarity between the current motion context and the historical motion sub-sequences.
On Human Motion Prediction Using Recurrent Neural Networks
It is shown that, surprisingly, state of the art performance can be achieved by a simple baseline that does not attempt to model motion at all, and a simple and scalable RNN architecture is proposed that obtains state-of-the-art performance on human motion prediction.
Socialbigat: Multimodal trajectory forecasting using bicycle-gan and graph attention networks
  • In NeurIPS,
  • 2019
Social-BiGAT: Multimodal Trajectory Forecasting using Bicycle-GAN and Graph Attention Networks
A graph-based generative adversarial network that generates realistic, multimodal trajectory predictions by better modelling the social interactions of pedestrians in a scene and achieves state-of-the-art performance comparing it to several baselines on existing trajectory forecasting benchmarks.
Action-Agnostic Human Pose Forecasting
The triangular-prism recurrent neural network (TP-RNN) models the hierarchical and multi-scale characteristics of human dynamics and captures the latent hierarchical structure in human pose sequences by encoding temporal dependencies with different time-scales.
Learning Progressive Joint Propagation for Human Motion Prediction
A transformer-based architecture with the global attention mechanism is applied, which performs in a central-to-peripheral extension according to the structural connectivity, and a memory-based dictionary is built, which aims to preserve the global motion patterns in training data to guide the predictions.
Deep neural networks enable quantitative movement analysis using single-camera videos
These methods for quantifying gait pathology with commodity cameras increase access to quantitative motion analysis in clinics and at home and enable researchers to conduct large-scale studies of neurological and musculoskeletal disorders.
Video-Based Motion Trajectory Forecasting Method for Proactive Construction Safety Monitoring Systems
This data indicates that falls, struck-bys, and caught-in/betweens are among the most common types of fatal accidents on construction sites and the majority of today’s accident prevention efforts are focused on preventing these types of accidents.
Contact and Human Dynamics from Monocular Video
A physics-based method for inferring 3D human motion from video sequences that takes initial 2D and 3D pose estimates as input and produces motions that are significantly more realistic than those from purely kinematic methods, substantially improving quantitative measures of both kinematics and dynamic plausibility.