# Frequency Domain Transformer Networks for Video Prediction

@article{Farazi2019FrequencyDT, title={Frequency Domain Transformer Networks for Video Prediction}, author={Hafez Farazi and Sven Behnke}, journal={ArXiv}, year={2019}, volume={abs/1903.00271} }

The task of video prediction is forecasting the next frames given some previous frames. Despite much recent progress, this task is still challenging mainly due to high nonlinearity in the spatial domain. To address this issue, we propose a novel architecture, Frequency Domain Transformer Network (FDTN), which is an end-to-end learnable model that estimates and uses the transformations of the signal in the frequency domain. Experimental evaluations show that this approach can outperform some…

## 9 Citations

Motion Segmentation using Frequency Domain Transformer Networks

- Computer ScienceESANN
- 2020

This work proposes a novel end-to-end learnable architecture that predicts the next frame by modeling foreground and background separately while simultaneously estimating and predicting the foreground motion using Frequency Domain Transformer Networks.

Local Frequency Domain Transformer Networks for Video Prediction

- Computer Science2021 International Joint Conference on Neural Networks (IJCNN)
- 2021

It is demonstrated that the method is readily extended to perform motion segmentation and account for the scene’s composition, and learns to produce reliable predictions in an entirely interpretable manner by only observing unlabeled video data.

Fourier-based Video Prediction through Relational Object Motion

- Computer ScienceESANN 2021 proceedings
- 2021

This work explores a different approach to video prediction by using frequency-domain approaches for video prediction and explicitly inferring object-motion relationships in the observed scene and resulting predictions are consistent with the observed dynamics in a scene and do not suffer from blur.

Video Prediction using Local Phase Differences

- 2020

Video prediction is commonly referred to as the task of forecasting future frames of a video sequence provided several past frames thereof. It remains a challenging domain as visual scenes evolve…

Semantic Prediction: Which One Should Come First, Recognition or Prediction?

- Computer ScienceESANN 2021 proceedings
- 2021

This work investigates configurations using the Local Frequency Domain Transformer Network (LFDTN) as the video prediction model and U-Net as the semantic extraction model on synthetic and real datasets.

Taylor Swift: Taylor Driven Temporal Modeling for Swift Future Frame Prediction

- Computer ScienceArXiv
- 2021

TayloSwiftNet is introduced, a novel convolutional neural network that learns to estimate the higher order terms of the Taylor series for a given input video and can swiftly predict any desired future frame in just one forward pass and change the temporal resolution on-the-fly.

Utilizing Temporal Information in Deep Convolutional Network for Efficient Soccer Ball Detection and Tracking

- Computer ScienceRoboCup
- 2019

This work presents a novel convolutional neural network approach to detect the soccer ball in an image sequence that exploits spatio-temporal correlation and detects the ball based on the trajectory of its movements.

PISEP^2: Pseudo Image Sequence Evolution based 3D Pose Prediction

- Computer ScienceThe Visual Computer
- 2021

A skeletal representation is proposed by transforming the joint coordinate sequence into an image sequence, which can model the different correlations of different joints, and a novel inference network is proposed to predict all future poses in one step by decoupling the decoders in a non-recursive manner.

Object-centered Fourier Motion Estimation and Segment-Transformation Prediction

- Computer ScienceESANN
- 2020

An objectcentered movement estimation, frame prediction, and correction framework using frequency-domain approaches that transform single objects based on estimated translation and rotation speeds which are correct using a learned encoding of the past.

## References

SHOWING 1-9 OF 9 REFERENCES

Location Dependency in Video Prediction

- Computer ScienceICANN
- 2018

The results indicate that encoding location-dependent features is crucial for the task of video prediction, and the proposed methods significantly outperform spatially invariant models.

Video Ladder Networks

- Computer Science, MathematicsArXiv
- 2016

The basic version of VLN is extended to incorporate ResNet-style residual blocks in the encoder and decoder, which help improving the prediction results.

Video Pixel Networks

- Computer ScienceICML
- 2017

A probabilistic video model, the Video Pixel Network (VPN), that estimates the discrete joint distribution of the raw pixel values in a video and generalizes to the motion of novel objects.

Modeling spatiotemporal information with convolutional gated networks

- Computer Science
- 2016

The developed convolutional version of the bilinear model for predicting spatiotemporal data halved the 4-step prediction loss while reducing the number of parameters by a factor of 159 compared to the original model.

Modeling Deep Temporal Dependencies with Recurrent "Grammar Cells"

- Computer ScienceNIPS
- 2014

This work shows how a bi-linear model of transformations, such as a gated autoencoder, can be turned into a recurrent network, by training it to predict future frames from the current one and the inferred transformation using backprop-through-time.

Extension of phase correlation to subpixel registration

- Mathematics, Computer ScienceIEEE Trans. Image Process.
- 2002

It is shown that for downsampled images the signal power in the phase correlation is not concentrated in a single peak, but rather in several coherent peaks mostly adjacent to each other.

An FFT-based technique for translation, rotation, and scale-invariant image registration

- Mathematics, Computer ScienceIEEE Trans. Image Process.
- 1996

This correspondence discusses an extension of the well-known phase correlation technique to cover translation, rotation, and scaling, which shows excellent robustness against random noise.

Learning to relate images.

- Computer Science, MedicineIEEE transactions on pattern analysis and machine intelligence
- 2013

This paper reviews the recent work on relational feature learning, and provides an analysis of the role that multiplicative interactions play in learning to encode relations, and discusses how square-pooling and complex cell models can be viewed as a way to representmultiplicative interactions and thereby as a ways to encoded relations.