How many Observations are Enough? Knowledge Distillation for Trajectory Forecasting

  title={How many Observations are Enough? Knowledge Distillation for Trajectory Forecasting},
  author={Alessio Monti and Angelo Porrello and Simone Calderara and Pasquale Coscia and Lamberto Ballan and Rita Cucchiara},
Accurate prediction of future human positions is an es-sential task for modern video-surveillance systems. Current state-of-the-art models usually rely on a “history” of past tracked locations (e.g., 3 to 5 seconds) to predict a plausible sequence of future locations (e.g., up to the next 5 seconds). We feel that this common schema neglects critical traits of realistic applications: as the collection of input trajectories involves machine perception (i.e., detection and tracking), incorrect… 
Overlooked Poses Actually Make Sense: Distilling Privileged Knowledge for Human Motion Prediction
A new prediction pattern is presented, which introduces previously overlooked human poses, to implement the prediction task from the view of interpolation, and achieves state-of-the-art performance on benchmarked H3.6M, CMU-Mocap and 3DPW datasets in both short-term and long-term predictions.


Context-Aware Trajectory Prediction
This work proposes a “context-aware” recurrent neural network LSTM model, which can learn and predict human motion in crowded spaces such as a sidewalk, a museum or a shopping mall, and evaluates the model on a public pedestrian datasets.
AC-VRNN: Attentive Conditional-VRNN for Multi-Future Trajectory Prediction
Trajectron++: Dynamically-Feasible Trajectory Forecasting with Heterogeneous Data
Trajectron++ is a modular, graph-structured recurrent model that forecasts the trajectories of a general number of diverse agents while incorporating agent dynamics and heterogeneous data and outperforming a wide array of state-of-the-art deterministic and generative methods.
You'll never walk alone: Modeling social behavior for multi-target tracking
A model of dynamic social behavior, inspired by models developed for crowd simulation, is introduced, trained with videos recorded from birds-eye view at busy locations, and applied as a motion model for multi-people tracking from a vehicle-mounted camera.
Group LSTM: Group Trajectory Prediction in Crowded Scenarios
This work proposes a novel approach to predict future trajectories in crowded scenes, at the group level, by exploiting the motion coherency and cluster trajectories that have similar motion trends, so pedestrians within the same group can be well segmented.
Social LSTM: Human Trajectory Prediction in Crowded Spaces
This work proposes an LSTM model which can learn general human movement and predict their future trajectories and outperforms state-of-the-art methods on some of these datasets.
One Thousand and One Hours: Self-driving Motion Prediction Dataset
This collection was collected by a fleet of 20 autonomous vehicles along a fixed route in Palo Alto, California over a four-month period and forms the largest, most complete and detailed dataset to date for the development of self-driving, machine learning tasks such as motion forecasting, planning and simulation.
Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes
This paper contributes a new large-scale dataset that collects videos of various types of targets that navigate in a real world outdoor environment such as a university campus and introduces a new characterization that describes the “social sensitivity” at which two targets interact.
STGAT: Modeling Spatial-Temporal Interactions for Human Trajectory Prediction
This work proposes a Spatial-Temporal Graph Attention network (STGAT), based on a sequence-to-sequence architecture to predict future trajectories of pedestrians, which achieves superior performance on two publicly available crowd datasets and produces more "socially" plausible trajectories for pedestrians.
Social Attention: Modeling Attention in Human Crowds
This work proposes Social Attention, a novel trajectory prediction model that captures the relative importance of each person when navigating in the crowd, irrespective of their proximity, and demonstrates the performance against a state-of-the-art approach on two publicly available crowd datasets.