Corpus ID: 237213539

Click to Move: Controlling Video Generation with Sparse Motion

  title={Click to Move: Controlling Video Generation with Sparse Motion},
  author={Pierfrancesco Ardino and Marco De Nadai and Bruno Lepri and Elisa Ricci and St{\'e}phane Lathuili{\`e}re},
This paper introduces Click to Move (C2M), a novel framework for video generation where the user can control the motion of the synthesized video through mouse clicks specifying simple object trajectories of the key objects in the scene. Our model receives as input an initial frame, its corresponding segmentation map and the sparse motion vectors encoding the input provided by the user. It outputs a plausible video sequence starting from the given frame and with a motion that is consistent with… Expand


Controllable Video Generation with Sparse Trajectories
This work presents a conditional video generation model that allows detailed control over the motion of the generated video and proposes a training paradigm that calculate trajectories from video clips, which eliminated the need of annotated training data. Expand
Animating Arbitrary Objects via Deep Motion Transfer
This paper introduces a novel deep learning framework for image animation that generates a video in which the target object is animated according to the driving sequence through a deep architecture that decouples appearance and motion information. Expand
Future Video Synthesis With Object Motion Prediction
An approach to predict future video frames given a sequence of continuous video frames in the past by decoupling the background scene and moving objects and shows that this model outperforms the state-of-the-art in terms of visual quality and accuracy. Expand
High-Quality Video Generation from Static Structural Annotations
This paper proposes a novel unsupervised video generation that is conditioned on a single structural annotation map, which in contrast to prior conditioned video generation approaches, provides aExpand
First Order Motion Model for Image Animation
This framework decouple appearance and motion information using a self-supervised formulation and uses a representation consisting of a set of learned keypoints along with their local affine transformations to support complex motions. Expand
Probabilistic Video Generation using Holistic Attribute Control
Improve the video generation consistency through temporally-conditional sampling and quality by structuring the latent space with attribute controls; ensuring that attributes can be both inferred and conditioned on during learning/generation. Expand
Flow-Grounded Spatial-Temporal Video Prediction from Still Images
This work forms the multi-frame prediction task as a multiple time step flow (multi-flow) prediction phase followed by a flow-to-frame synthesis phase, which prevents the model from directly looking at the high-dimensional pixel space of the frame sequence and is demonstrated to be more effective in predicting better and diverse results. Expand
Video Generation From Single Semantic Label Map
A cVAE for predicting optical flow is employed as a beneficial intermediate step to generate a video sequence conditioned on the initial single frame, and a semantic label map is integrated into the flow prediction module to achieve major improvements in the image-to-video generation process. Expand
SDC-Net: Video Prediction Using Spatially-Displaced Convolution
SDC module for video frame prediction with spatially-displaced convolution inherits the merits of both vector-based and kernel-based approaches, while ameliorating their respective disadvantages. Expand
MoCoGAN: Decomposing Motion and Content for Video Generation
This work introduces a novel adversarial learning scheme utilizing both image and video discriminators and shows that MoCoGAN allows one to generate videos with same content but different motion as well as videos with different content and same motion. Expand