Knowledge Transfer for Scene-Specific Motion Prediction

@article{Ballan2016KnowledgeTF,
  title={Knowledge Transfer for Scene-Specific Motion Prediction},
  author={Lamberto Ballan and Francesco Castaldo and Alexandre Alahi and Francesco A. N. Palmieri and Silvio Savarese},
  journal={ArXiv},
  year={2016},
  volume={abs/1603.06987}
}
When given a single frame of the video, humans can not only interpret the content of the scene, but also they are able to forecast the near future. This ability is mostly driven by their rich prior knowledge about the visual world, both in terms of (i) the dynamics of moving agents, as well as (ii) the semantic of the scene. In this work we exploit the interplay between these two key elements to predict scene-specific motion patterns. First, we extract patch descriptors encoding the probability… 
Learning Structured Representations of Spatial and Interactive Dynamics for Trajectory Prediction in Crowded Scenes
TLDR
This work proposes a modular method that utilises a learned model of the environment for motion prediction that allows for robust and label efficient forward modelling, and relaxes the need for full model re-training in new environments.
CAR-Net: Clairvoyant Attentive Recurrent Network
TLDR
A Clairvoyant Attentive Recurrent Network (CAR-Net) that learns where to look in a large image of the scene when solving the path prediction task, and shows CAR-Net’s ability to generalize to unseen scenes.
Deep Context Maps: Agent Trajectory Prediction Using Location-Specific Latent Maps
TLDR
A novel approach for agent motion prediction in cluttered environments by posing context map learning as a multi-task training problem and describing the map model and its incorporation into a state-of-the-art trajectory predictor.
Learning Occupancy Priors of Human Motion From Semantic Maps of Urban Environments
TLDR
This work applies and discusses a traditional Inverse Optimal Control approach, and proposes a novel approach based on Convolutional Neural Networks (CNN) to predict future occupancy maps, which produces flexible context-aware occupancy estimations for semantically uniform map regions.
Group LSTM: Group Trajectory Prediction in Crowded Scenarios
TLDR
This work proposes a novel approach to predict future trajectories in crowded scenes, at the group level, by exploiting the motion coherency and cluster trajectories that have similar motion trends, so pedestrians within the same group can be well segmented.
Generic Probabilistic Interactive Situation Recognition and Prediction: From Virtual to Real
TLDR
A generic probabilistic hierarchical recognition and prediction framework which employs a two-layer Hidden Markov Model (TLHMM) to obtain the distribution of potential situations and a learning-based dynamic scene evolution model to sample a group of future trajectories is proposed.
Pedestrian Path Forecasting in Crowd: A Deep Spatio-Temporal Perspective
  • Yuke Li
  • Computer Science
    ACM Multimedia
  • 2017
TLDR
A deep spatio-temporal learning-forecasting approach that takes as input the previously extracted high-level motion cues and outputs the potential future walking routes of all pedestrians in one shot, and introduces large margin improvements with respect to recent works in the literature.
Goal-driven Self-Attentive Recurrent Networks for Trajectory Prediction
TLDR
This work proposes a lightweight attention-based recurrent backbone that acts solely on past observed positions, based on a U-Net architecture, and demonstrates that its prediction accuracy can be improved considerably when combined with a scene-aware goal-estimation module.
DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents
TLDR
The proposed Deep Stochastic IOC RNN Encoder-decoder framework, DESIRE, for the task of future predictions of multiple interacting agents in dynamic scenes significantly improves the prediction accuracy compared to other baseline methods.
...
...

References

SHOWING 1-10 OF 51 REFERENCES
Patch to the Future: Unsupervised Visual Prediction
TLDR
This paper presents a conceptually simple but surprisingly powerful method for visual prediction which combines the effectiveness of mid-level visual elements with temporal modeling and shows that it is comparable to supervised methods for event prediction.
Predicting Object Dynamics in Scenes
TLDR
This paper learns from sequences of abstract images gathered using crowd-sourcing to overcome a lack of densely annotated spatiotemporal data, and demonstrates qualitatively and quantitatively that their models produce plausible scene predictions on both the abstract images, as well as natural images taken from the Internet.
Anticipating Visual Representations from Unlabeled Video
TLDR
This work presents a framework that capitalizes on temporal structure in unlabeled video to learn to anticipate human actions and objects and applies recognition algorithms on the authors' predicted representation to anticipate objects and actions.
Joint inference of groups, events and human roles in aerial videos
TLDR
This paper addresses a new problem of parsing low-resolution aerial videos of large spatial areas, in terms of grouping, recognizing events and 3) assigning roles to people engaged in events, using a spatiotemporal AND-OR graph.
Unsupervised Learning of Functional Categories in Video Scenes
TLDR
This work presents a novel form of video scene analysis where scene element categories such as roads, parking areas, sidewalks and entrances, can be segmented and categorized based on the behaviors of moving objects in and around them.
A Data-Driven Approach for Event Prediction
TLDR
This work presents a simple method to identify videos with unusual events in a large collection of short video clips, inspired by recent approaches in computer vision that rely on large databases and shows how a very simple retrieval model is able to provide reliable results.
Activity Forecasting
TLDR
The unified model uses state-of-the-art semantic scene understanding combined with ideas from optimal control theory to achieve accurate activity forecasting and shows how the same techniques can improve the results of tracking algorithms by leveraging information about likely goals and trajectories.
Learning an Image-Based Motion Context for Multiple People Tracking
TLDR
A novel method for multiple people tracking that leverages a generalized model for capturing interactions among individuals which is able to encode the effect of undetected targets, making the tracker more robust to partial occlusions.
Learning Semantic Scene Models by Trajectory Analysis
TLDR
An unsupervised learning framework to segment a scene into semantic regions and to build semantic scene models from long-term observations of moving objects in the scene is described and novel clustering algorithms which use both similarity and comparison confidence are introduced.
Anticipating the future by watching unlabeled video
TLDR
A large scale framework that capitalizes on temporal structure in unlabeled video to learn to anticipate both actions and objects in the future, and suggests that learning with unlabeling videos significantly helps forecast actions and anticipate objects.
...
...