• Corpus ID: 244799218

TCTN: A 3D-Temporal Convolutional Transformer Network for Spatiotemporal Predictive Learning

  title={TCTN: A 3D-Temporal Convolutional Transformer Network for Spatiotemporal Predictive Learning},
  author={Ziao Yang and Xiangru Yang and Qifeng Lin},
Spatiotemporal predictive learning is to generate future frames given a sequence of historical frames. Conventional algorithms are mostly based on recurrent neural networks (RNNs). However, RNN suffers from heavy computational burden such as time and long back-propagation process due to the seriality of recurrent structure. Recently, Transformerbased methods have also been investigated in the form of encoder-decoder or plain encoder, but the encoder-decoder form requires too deep networks and… 

Figures and Tables from this paper


PredRNN: Recurrent Neural Networks for Predictive Learning using Spatiotemporal LSTMs
A predictive recurrent neural network (PredRNN) that achieves the state-of-the-art prediction performance on three video prediction datasets and is a more general framework, that can be easily extended to other predictive learning tasks by integrating with other architectures.
PredRNN++: Towards A Resolution of the Deep-in-Time Dilemma in Spatiotemporal Predictive Learning
A Gradient Highway architecture is proposed, which provides alternative shorter routes for gradient flows from outputs back to long-range inputs, enabling PredRNN++ to capture short-term and long-term dependencies adaptively and to ease the vanishing gradient problem.
Temporal Convolutional Networks for Action Segmentation and Detection
A class of temporal models that use a hierarchy of temporal convolutions to perform fine-grained action segmentation or detection, which are capable of capturing action compositions, segment durations, and long-range dependencies, and are over a magnitude faster to train than competing LSTM-based Recurrent Neural Networks.
Temporal Convolutional Networks: A Unified Approach to Action Segmentation
This work proposes a unified approach, as demonstrated by the Temporal Convolutional Network (TCN), that hierarchically captures relationships at low-level time-scales, and can be trained in a fraction of the time it takes to train an RNN.
Self-Attention ConvLSTM for Spatiotemporal Prediction
A novel self-attention memory (SAM) is proposed to memorize features with long-range dependencies in terms of spatial and temporal domains and is applied to perform frame prediction on the MovingMNIST and KTH datasets and traffic flow Prediction on the TexiBJ dataset.
Unsupervised Learning of Video Representations using LSTMs
This work uses Long Short Term Memory networks to learn representations of video sequences and evaluates the representations by finetuning them for a supervised learning problem - human action recognition on the UCF-101 and HMDB-51 datasets.
Attention is All you Need
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting
This paper proposes the convolutional LSTM (ConvLSTM) and uses it to build an end-to-end trainable model for the precipitation nowcasting problem and shows that it captures spatiotemporal correlations better and consistently outperforms FC-L STM and the state-of-the-art operational ROVER algorithm.
Spatiotemporal Convolutional LSTM for Radar Echo Extrapolation
A novel spatiotemporal convolutional long short-term memory (ST-ConvLSTM) network for radar echo extrapolation that adopts the attention mechanism to model long-range and long-term spatiotsemporal dependency, and thus obtains enhanced spatiotmporal representation capabilities.
Deep Spatio-Temporal Residual Networks for Citywide Crowd Flows Prediction
A deep-learning-based approach to collectively forecast the inflow and outflow of crowds in each and every region of a city, using the residual neural network framework to model the temporal closeness, period, and trend properties of crowd traffic.