Self-Supervised Equivariant Scene Synthesis from Video
@article{Resnick2021SelfSupervisedES, title={Self-Supervised Equivariant Scene Synthesis from Video}, author={Cinjon Resnick and Or Litany and Cosmas Hei{\ss} and H. Larochelle and Joan Bruna and Kyunghyun Cho}, journal={ArXiv}, year={2021}, volume={abs/2102.00863} }
We propose a self-supervised framework to learn scene representations from video that are automatically delineated into background, characters, and their animations. Our method capitalizes on moving characters being equivariant with respect to their transformation across frames and the background being constant with respect to that same transformation. After training, we can manipulate image encodings in real time to create unseen combinations of the delineated components. As far as we know, we…
Figures and Tables from this paper
References
SHOWING 1-10 OF 27 REFERENCES
First Order Motion Model for Image Animation
- Computer ScienceNeurIPS
- 2019
This framework decouple appearance and motion information using a self-supervised formulation and uses a representation consisting of a set of learned keypoints along with their local affine transformations to support complex motions.
Generating Videos with Scene Dynamics
- Computer ScienceNIPS
- 2016
A generative adversarial network for video with a spatio-temporal convolutional architecture that untangles the scene's foreground from the background is proposed that can generate tiny videos up to a second at full frame rate better than simple baselines.
Animating Arbitrary Objects via Deep Motion Transfer
- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019
This paper introduces a novel deep learning framework for image animation that generates a video in which the target object is animated according to the driving sequence through a deep architecture that decouples appearance and motion information.
Motion-supervised Co-Part Segmentation
- Computer Science2020 25th International Conference on Pattern Recognition (ICPR)
- 2021
This work proposes a self-supervised deep learning method for co-part segmentation that develops the idea that motion information inferred from videos can be leveraged to discover meaningful object parts.
DwNet: Dense warp-based network for pose-guided human video generation
- Computer ScienceBMVC
- 2019
This paper focuses on human motion transfer - generation of a video depicting a particular subject, observed in a single image, performing a series of motions exemplified by an auxiliary (driving) video.
Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks
- Computer ScienceNIPS
- 2016
A novel approach that models future frames in a probabilistic manner is proposed, namely a Cross Convolutional Network to aid in synthesizing future frames; this network structure encodes image and motion information as feature maps and convolutional kernels, respectively.
Learning a Generative Model of Images by Factoring Appearance and Shape
- Computer ScienceNeural Computation
- 2011
This work introduces a basic model, the masked RBM, which explicitly models occlusion boundaries in image patches by factoring the appearance of any patch region from its shape, and proposes a generative model of larger images using a field of such RBMs.
Unsupervised Learning of Disentangled Representations from Video
- Computer ScienceNIPS
- 2017
We present a new model DrNET that learns disentangled image representations from video. Our approach leverages the temporal coherence of video and a novel adversarial loss to learn a representation…
Decomposing Motion and Content for Natural Video Sequence Prediction
- Computer ScienceICLR
- 2017
To the best of the knowledge, this is the first end-to-end trainable network architecture with motion and content separation to model the spatiotemporal dynamics for pixel-level future prediction in natural videos.
MoCoGAN: Decomposing Motion and Content for Video Generation
- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
This work introduces a novel adversarial learning scheme utilizing both image and video discriminators and shows that MoCoGAN allows one to generate videos with same content but different motion as well as videos with different content and same motion.