Modernn: Towards Fine-Grained Motion Details for Spatiotemporal Predictive Learning

  title={Modernn: Towards Fine-Grained Motion Details for Spatiotemporal Predictive Learning},
  author={Zenghao Chai and Zhengzhuo Xu and Chun Yuan},
  journal={ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  • Zenghao ChaiZhengzhuo XuChun Yuan
  • Published 25 October 2021
  • Computer Science
  • ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Spatiotemporal predictive learning (ST-PL) aims at predicting the subsequent frames via limited observed sequences, and it has broad applications in the real world. However, learning representative spatiotemporal features for prediction is challenging. Moreover, chaotic uncertainty among consecutive frames exacerbates the difficulty in long-term prediction. This paper concentrates on improving prediction quality by enhancing the correspondence between the previous context and the current state… 

Figures and Tables from this paper



PredRNN++: Towards A Resolution of the Deep-in-Time Dilemma in Spatiotemporal Predictive Learning

A Gradient Highway architecture is proposed, which provides alternative shorter routes for gradient flows from outputs back to long-range inputs, enabling PredRNN++ to capture short-term and long-term dependencies adaptively and to ease the vanishing gradient problem.

Self-Attention ConvLSTM for Spatiotemporal Prediction

A novel self-attention memory (SAM) is proposed to memorize features with long-range dependencies in terms of spatial and temporal domains and is applied to perform frame prediction on the MovingMNIST and KTH datasets and traffic flow Prediction on the TexiBJ dataset.

PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive Learning

PredRNN, a new recurrent network, in which a pair of memory cells are explicitly decoupled, operate in nearly independent transition manners, and finally form unified representations of the complex environment, is presented.

PredRNN: Recurrent Neural Networks for Predictive Learning using Spatiotemporal LSTMs

A predictive recurrent neural network (PredRNN) that achieves the state-of-the-art prediction performance on three video prediction datasets and is a more general framework, that can be easily extended to other predictive learning tasks by integrating with other architectures.

Convolutional Tensor-Train LSTM for Spatio-temporal Learning

A higher-order convolutional LSTM model that can efficiently learn long-term spatio-temporal correlations in the video sequence, along with a succinct representations of the history is proposed, which outperforms existing approaches, but uses only a fraction of parameters, including the baseline models.

Learning to Decompose and Disentangle Representations for Video Prediction

The Decompositional Disentangled Predictive Auto-Encoder (DDPAE) is proposed, a framework that combines structured probabilistic models and deep networks to automatically decompose the high-dimensional video that the authors aim to predict into components, and disentangle each component to have low-dimensional temporal dynamics that are easier to predict.

Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting

This paper proposes the convolutional LSTM (ConvLSTM) and uses it to build an end-to-end trainable model for the precipitation nowcasting problem and shows that it captures spatiotemporal correlations better and consistently outperforms FC-L STM and the state-of-the-art operational ROVER algorithm.

Disentangling Multiple Features in Video Sequences Using Gaussian Processes in Variational Autoencoders

This work introduces MGP-VAE, a variational autoencoder which uses Gaussian processes to model the latent space for the unsupervised learning of disentangled representations in video sequences, and introduces a novel geodesic loss function which takes into account the curvature of the data manifold to improve learning.

Deep Learning for Precipitation Nowcasting: A Benchmark and A New Model

This work goes beyond ConvLSTM and proposes the Trajectory GRU (TrajGRU) model that can actively learn the location-variant structure for recurrent connections, and provides a benchmark that includes a real-world large-scale dataset from the Hong Kong Observatory.

Memory in Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity From Spatiotemporal Dynamics

The Memory In Memory networks and corresponding recurrent blocks exploit the differential signals between adjacent recurrent states to model the non-stationary and approximately stationary properties in spatiotemporal dynamics with two cascaded, self-renewed memory modules.