Non-Adversarial Video Synthesis with Learned Priors

  title={Non-Adversarial Video Synthesis with Learned Priors},
  author={Abhishek Aich and Akash Gupta and Rameswar Panda and Rakib Hyder and M. Salman Asif and Amit K. Roy-Chowdhury},
  journal={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
Most of the existing works in video synthesis focus on generating videos using adversarial learning. Despite their success, these methods often require input reference frame or fail to generate diverse videos from the given data distribution, with little to no uniformity in the quality of videos that can be generated. Different from these methods, we focus on the problem of generating videos from latent noise vectors, without any reference input frames. To this end, we develop a novel approach… 
StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2
This work rethink the traditional image + video discriminators pair and design a holistic discriminator that aggregates temporal information by simply concatenating frames’ features, which decreases the training cost and provides richer learning signal to the generator, making it possible to train directly on 1024 2 videos for the first time.
ALANET: Adaptive Latent Attention Network for Joint Video Deblurring and Interpolation
A novel architecture, Adaptive Latent Attention Network (ALANET), is introduced, which synthesizes sharp high frame-rate videos with no prior knowledge of input frames being blurry or not, thereby performing the task of both deblurring and interpolation.
3D-Aware Video Generation
This work develops a GAN framework that synthesizes 3D video supervised only with monocular videos and learns a rich embedding of decomposable 3D structures and motions that enables new visual effects of spatio-temporal renderings while producing imagery with quality comparable to that of existing 3D or video GANs.
ArrowGAN : Learning to Generate Videos by Learning Arrow of Time
Adversarial Attacks on Black Box Video Classifiers: Leveraging the Power of Geometric Transformations
A novel iterative algorithm Geometric TRAnsformed Perturbations (G EO -T RAP), for attacking video classification models and achieves new state-of-the-art results under black-box settings on two large datasets.
Ada-VSR: Adaptive Video Super-Resolution with Meta-Learning
Meta-learning is employed to obtain adaptive parameters, using a large-scale external dataset, that can adapt quickly to the novel condition of the given test video during the internal learning task, thereby exploiting external and internal information of a video for super-resolution.
Video Reenactment as Inductive Bias for Content-Motion Disentanglement
This work introduces MTC-VAE, a self-supervised motion-transfer VAE model to disentangle motion and content from videos, which adopts a chunk-wise modeling approach and takes advantage of the motion information contained in spatiotemporal neighborhoods.
Adapting Across Domains by Aligning Representations and Images
Adapting Across Domains by Aligning Representations and Images helps clarify the role of language and image in the design of knowledge representation.
Poisson2Sparse: Self-Supervised Poisson Denoising From a Single Image
This work explores a sparsity and dictionary learning-based approach and presents a novel self-supervised learning method for single-image denoising where the noise is approximated as a Poisson process, requiring no clean ground-truth data.
Spatio-Temporal Representation Factorization for Video-based Person Re-Identification
Spatio-Temporal Representation Factorization (STRF), a flexible new computational unit that can be used in conjunction with most existing 3D convolutional neural network architectures for re-ID, is proposed and empirically shows that STRF improves performance of various existing baseline architectures while demonstrating new state-of-the-art results using standard person re- ID evaluation protocols on three benchmarks.


Non-Adversarial Image Synthesis With Generative Latent Nearest Neighbors
This work presents a novel method - Generative Latent Nearest Neighbors (GLANN) - for training generative models without adversarial training that combines the strengths of IMLE and GLO in a way that overcomes the main drawbacks of each method.
Video-to-Video Synthesis
This paper proposes a novel video-to-video synthesis approach under the generative adversarial learning framework, capable of synthesizing 2K resolution videos of street scenes up to 30 seconds long, which significantly advances the state-of-the-art of video synthesis.
Optimizing the Latent Space of Generative Networks
Generative Latent Optimization (GLO), a framework to train deep convolutional generators using simple reconstruction losses, and enjoys many of the desirable properties of GANs: synthesizing visually-appealing samples, interpolating meaningfully between samples, and performing linear arithmetic with noise vectors; all of this without the adversarial optimization scheme.
Predicting Future Frames Using Retrospective Cycle GAN
  • Y. Kwon, M. Park
  • Computer Science
    2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2019
This paper proposes a unified generative adversarial network for predicting accurate and temporally consistent future frames over time, even in a challenging environment, and employs two discriminators not only to identify fake frames but also to distinguish fake contained image sequences from the real sequence.
Dual Motion GAN for Future-Flow Embedded Video Prediction
A dual motion Generative Adversarial Net architecture is developed, which learns to explicitly enforce future-frame predictions to be consistent with the pixel-wise flows in the video through a duallearning mechanism.
Improved Techniques for Training GANs
This work focuses on two applications of GANs: semi-supervised learning, and the generation of images that humans find visually realistic, and presents ImageNet samples with unprecedented resolution and shows that the methods enable the model to learn recognizable features of ImageNet classes.
Generating Videos with Scene Dynamics
A generative adversarial network for video with a spatio-temporal convolutional architecture that untangles the scene's foreground from the background is proposed that can generate tiny videos up to a second at full frame rate better than simple baselines.
Point-to-Point Video Generation
This work introduces point-to-point video generation that controls the generation process with two control points: the targeted start- and end-frames and proposes to maximize the modified variational lower bound of conditional data likelihood under a skip-frame training strategy.
Towards Understanding the Dynamics of Generative Adversarial Networks
This model and analysis point to a specific challenge in practical GAN training that is called discriminator collapse, and proposes a simple model that exhibits several of the common problematic convergence behaviors and still allows the first convergence bounds for parametric GAN dynamics.
Probabilistic Video Generation using Holistic Attribute Control
Improve the video generation consistency through temporally-conditional sampling and quality by structuring the latent space with attribute controls; ensuring that attributes can be both inferred and conditioned on during learning/generation.