• Corpus ID: 239998098

Taylor Swift: Taylor Driven Temporal Modeling for Swift Future Frame Prediction

  title={Taylor Swift: Taylor Driven Temporal Modeling for Swift Future Frame Prediction},
  author={Mohammad Saber Pourheydari and Mohsen Fayyaz and Emad Bahrami Rad and Mehdi Noroozi and Juergen Gall},
While recurrent neural networks (RNNs) demonstrate outstanding capabilities in future video frame prediction, they model dynamics in a discrete time space and sequentially go through all frames until the desired future temporal step is reached. RNNs are therefore prone to accumulate the error as the number of future frames increases. In contrast, partial differential equations (PDEs) model physical phenomena like dynamics in continuous time space, however, current PDE-based approaches… 

Figures and Tables from this paper


Spatio-temporal video autoencoder with differentiable memory
One direct application of the proposed framework in weakly-supervised semantic segmentation of videos through label propagation using optical flow is presented, using as temporal decoder a robust optical flow prediction module together with an image sampler serving as built-in feedback loop.
Learning to Generate Long-term Future via Hierarchical Prediction
This model is built with a combination of LSTM and analogy based encoder-decoder convolutional neural networks, which independently predict the video structure and generate the future frames, respectively, which prevents pixel-level error propagation from happening by removing the need to observe the predicted frames.
Flow-Grounded Spatial-Temporal Video Prediction from Still Images
This work forms the multi-frame prediction task as a multiple time step flow (multi-flow) prediction phase followed by a flow-to-frame synthesis phase, which prevents the model from directly looking at the high-dimensional pixel space of the frame sequence and is demonstrated to be more effective in predicting better and diverse results.
Folded Recurrent Neural Networks for Future Video Prediction
This work introduces bijective Gated Recurrent Units, a double mapping between the input and output of a GRU layer that allows for recurrent auto-encoders with state sharing between encoder and decoder, stratifying the sequence representation and helping to prevent capacity problems.
Deep multi-scale video prediction beyond mean square error
This work trains a convolutional network to generate future frames given an input sequence and proposes three different and complementary feature learning strategies: a multi-scale architecture, an adversarial training method, and an image gradient difference loss function.
Learning the Spatio-Temporal Dynamics of Physical Processes from Partial Observations
A data-driven framework is proposed, where the system’s dynamics are modeled by an unknown time-varying differential equation and the evolution term for the state is estimated from the partially observed data only, using a deep convolutional neural network.
Shape and Time Distortion Loss for Training Deep Time Series Forecasting Models
DILATE (DIstortion Loss including shApe and TimE), a new objective function for training deep neural networks that aims at accurately predicting sudden changes, is introduced, and explicitly incorporates two terms supporting precise shape and temporal change detection.
PredRNN++: Towards A Resolution of the Deep-in-Time Dilemma in Spatiotemporal Predictive Learning
A Gradient Highway architecture is proposed, which provides alternative shorter routes for gradient flows from outputs back to long-range inputs, enabling PredRNN++ to capture short-term and long-term dependencies adaptively and to ease the vanishing gradient problem.
Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting
This paper proposes the convolutional LSTM (ConvLSTM) and uses it to build an end-to-end trainable model for the precipitation nowcasting problem and shows that it captures spatiotemporal correlations better and consistently outperforms FC-L STM and the state-of-the-art operational ROVER algorithm.
PredRNN: Recurrent Neural Networks for Predictive Learning using Spatiotemporal LSTMs
A predictive recurrent neural network (PredRNN) that achieves the state-of-the-art prediction performance on three video prediction datasets and is a more general framework, that can be easily extended to other predictive learning tasks by integrating with other architectures.