Dual Motion GAN for Future-Flow Embedded Video Prediction

@article{Liang2017DualMG,
  title={Dual Motion GAN for Future-Flow Embedded Video Prediction},
  author={Xiaodan Liang and Lisa Lee and Wei Dai and Eric P. Xing},
  journal={2017 IEEE International Conference on Computer Vision (ICCV)},
  year={2017},
  pages={1762-1770}
}
  • Xiaodan Liang, Lisa Lee, E. Xing
  • Published 1 August 2017
  • Computer Science
  • 2017 IEEE International Conference on Computer Vision (ICCV)
Future frame prediction in videos is a promising avenue for unsupervised video representation learning. [] Key Method The primal future-frame prediction and dual future-flow prediction form a closed loop, generating informative feedback signals to each other for better video prediction.

Figures and Tables from this paper

Predicting Future Frames Using Retrospective Cycle GAN
  • Y. Kwon, M. Park
  • Computer Science
    2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2019
TLDR
This paper proposes a unified generative adversarial network for predicting accurate and temporally consistent future frames over time, even in a challenging environment, and employs two discriminators not only to identify fake frames but also to distinguish fake contained image sequences from the real sequence.
Photo-Realistic Video Prediction on Natural Videos of Largely Changing Frames
TLDR
A deep residual network with the hierarchical architecture where each layer makes a prediction of future state at different spatial resolution, and these predictions of different layers are merged via top-down connections to generate future frames that are perceptually more realistic than the baselines.
Predicting Video Frames Using Feature Based Locally Guided Objectives
TLDR
This paper presents feature reconstruction based approach using Generative Adversarial Networks (GAN) to solve the problem of predicting future frames from natural video scenes and proposes two novel objective functions: Locally Guided Gram Loss (LGGL) and Multi-Scale Correlation Loss (MSCL) to further enhance the quality of the predicted frames.
Flow-Grounded Spatial-Temporal Video Prediction from Still Images
TLDR
This work forms the multi-frame prediction task as a multiple time step flow (multi-flow) prediction phase followed by a flow-to-frame synthesis phase, which prevents the model from directly looking at the high-dimensional pixel space of the frame sequence and is demonstrated to be more effective in predicting better and diverse results.
Adaptive Hierarchical Motion-Focused Model for Video Prediction
TLDR
An adaptive hierarchical motion-focused model is introduced to predict realistic future frames and takes advantage of hierarchical motion modeling and adaptive transformation strategy, which can achieve better motion understanding and applying.
Video prediction: a step-by-step improvement of a video synthesis network
TLDR
An accompanying convolution model and corresponding algorithm for improving image sharpness are proposed and experimental results prove the effectiveness of this framework.
Structure Preserving Video Prediction
TLDR
A RNN structure for video prediction is proposed, which employs temporal-adaptive convolutional kernels to capture time-varying motion patterns as well as tiny objects within a scene.
FUTUREGAN: ANTICIPATING THE FUTURE FRAMES OF VIDEO SEQUENCES USING SPATIO-TEMPORAL 3D CONVOLUTIONS IN PROGRESSIVELY GROWING GANS
  • Sandra Aigner, Marco Korner
  • Computer Science
    The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
  • 2019
TLDR
A new encoder-decoder GAN model that predicts future frames of a video sequence conditioned on a sequence of past frames, applicable to various different datasets without additional changes, whilst achieving stable results that are competitive to the state-of-the-art in video prediction.
Multi-Scale Attention Generative Adversarial Networks for Video Frame Interpolation
TLDR
A multi-scale dense attention generative adversarial network is proposed for video frame interpolation and results on several datasets demonstrate that the approach attains higher performance and produce more photo-realistic in-between frame comparing with previous works.
Video Frame Prediction by Deep Multi-Branch Mask Network
TLDR
A deep multi-branch mask network (DMMNet) is constructed which adaptively fuses the advantages of optical flow warping and RGB pixel synthesizing methods and provides a more flexible masking network for motion and appearance fusion on video frame prediction.
...
...

References

SHOWING 1-10 OF 41 REFERENCES
Deep multi-scale video prediction beyond mean square error
TLDR
This work trains a convolutional network to generate future frames given an input sequence and proposes three different and complementary feature learning strategies: a multi-scale architecture, an adversarial training method, and an image gradient difference loss function.
Video Frame Synthesis Using Deep Voxel Flow
TLDR
This work addresses the problem of synthesizing new video frames in an existing video, either in-between existing frames (interpolation), or subsequent to them (extrapolation), by training a deep network that learns to synthesize video frames by flowing pixel values from existing ones, which is called deep voxel flow.
Generating Videos with Scene Dynamics
TLDR
A generative adversarial network for video with a spatio-temporal convolutional architecture that untangles the scene's foreground from the background is proposed that can generate tiny videos up to a second at full frame rate better than simple baselines.
Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks
TLDR
A novel approach that models future frames in a probabilistic manner is proposed, namely a Cross Convolutional Network to aid in synthesizing future frames; this network structure encodes image and motion information as feature maps and convolutional kernels, respectively.
Spatio-temporal video autoencoder with differentiable memory
TLDR
One direct application of the proposed framework in weakly-supervised semantic segmentation of videos through label propagation using optical flow is presented, using as temporal decoder a robust optical flow prediction module together with an image sampler serving as built-in feedback loop.
Action-Conditional Video Prediction using Deep Networks in Atari Games
TLDR
This paper is the first to make and evaluate long-term predictions on high-dimensional video conditioned by control inputs and proposes and evaluates two deep neural network architectures that consist of encoding, action-conditional transformation, and decoding layers based on convolutional neural networks and recurrent neural networks.
Next-Flow: Hybrid Multi-Tasking with Next-Frame Prediction to Boost Optical-Flow Estimation in the Wild
TLDR
This work seeks to boost CNN-based flow estimation in real scenes with the help of the freely available self-supervised task of next-frame prediction, by training the network in a hybrid way with a novel time-variant multi-tasking architecture.
Unsupervised Learning of Long-Term Motion Dynamics for Videos
TLDR
An unsupervised representation learning approach that compactly encodes the motion dependencies in videos and demonstrates the effectiveness of the learned temporal representations on activity classification across multiple modalities and datasets such as NTU RGB+D and MSR Daily Activity 3D.
Anticipating Visual Representations from Unlabeled Video
TLDR
This work presents a framework that capitalizes on temporal structure in unlabeled video to learn to anticipate human actions and objects and applies recognition algorithms on the authors' predicted representation to anticipate objects and actions.
Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning
TLDR
The results suggest that prediction represents a powerful framework for unsupervised learning, allowing for implicit learning of object and scene structure.
...
...