Layered neural atlases for consistent video editing

@article{Kasten2021LayeredNA,
  title={Layered neural atlases for consistent video editing},
  author={Yoni Kasten and Dolev Ofri and Oliver Wang and Tali Dekel},
  journal={ACM Transactions on Graphics (TOG)},
  year={2021},
  volume={40},
  pages={1 - 12}
}
We present a method that decomposes, and "unwraps", an input video into a set of layered 2D atlases, each providing a unified representation of the appearance of an object (or background) over the video. For each pixel in the video, our method estimates its corresponding 2D coordinate in each of the atlases, giving us a consistent parameterization of the video, along with an associated alpha (opacity) value. Importantly, we design our atlases to be interpretable and semantic, which facilitatesโ€ฆย 

Deformable Sprites for Unsupervised Video Decomposition

Deformable Sprites are a type of video auto-encoder model that is optimized on individual videos, and does not require training on a large dataset, nor does it rely on pretrained models.

Text2LIVE: Text-Driven Layered Image and Video Editing

The key idea is to generate an edit layer (color+opacity) that is composited over the original input that allows us to constrain the generation process and maintain high fidelity to the originalinput via novel text-driven losses that are applied directly to the edit layer.

Temporally Consistent Semantic Video Editing

This work presents a simple yet effective method to facilitate temporally coherent video editing by adjusting the latent codes via an MLP and tuning G to achieve temporal consistency and preserve its similarity to the direct editing results.

The Anatomy of Video Editing: A Dataset and Benchmark Suite for AI-Assisted Video Editing

This work introduces the Anatomy of Video Editing, a dataset, and benchmark, to foster research in AI-assisted video editing, and establishes competitive baseline methods and detailed analyses for each of the tasks.

Neural Parameterization for Dynamic Human Head Editing

This work presents Neural Parameterization (NeP), a hybrid representation that provides the advantages of both implicit and explicit methods and develops a hybrid 2D texture consisting of an explicit texture map for easy editing and implicit view and time-dependent residuals to model temporal and view variations.

DynIBaR: Neural Dynamic Image-Based Rendering

This work presents a new approach that addresses the problem of synthesizing novel views from a monocular video depicting a complex dynamic scene by adopting a volumetric image-based rendering framework that synthesizes new viewpoints by aggregating features from nearby views in a scene-motion-aware manner.

AI Video Editing: a Survey

This paper summaries the development history of automatic video editing, and especially the applications of AI in partial and full workflows, and summarizes the progresses in image editing domain, i.e., style transferring, retargeting, and colorization.

Decomposing NeRF for Editing via Feature Field Distillation

This work proposes to distill the knowledge of off-the-shelf, supervised and self-supervised 2D image feature extractors into a 3D feature optimized in parallel to the radiance, enabling query-based local editing of the represented 3D scenes.

Guess What Moves: Unsupervised Video and Image Segmentation by Anticipating Motion

This work proposes to supervise an image segmentation network, tasking it with predicting regions that are likely to contain simple motion patterns, and thus likely to correspond to objects, and applies this network in two modes.

Text-Driven Stylization of Video Objects

This work uses a pretrained atlas decomposition network to propagate the edits in a temporally consistent manner and demonstrates that the method can generate consistent style changes in time for a variety of objects and videos, that adhere to the specification of the target texts.

References

SHOWING 1-10 OF 38 REFERENCES

Layered neural rendering for retiming people in video

A key property of this model is that it not only disentangles the direct motions of each person in the input video, but also correlates each person automatically with the scene changes they generate---e.g., shadows, reflections, and motion of loose clothing.

Panoramic video textures

This paper describes a mostly automatic method for taking the output of a single panning video camera and creating a panoramic video texture (PVT): a video that has been stitched into a single, wideโ€ฆ

Omnimatte: Associating Objects and Their Effects in Video

This work estimates an omnimatte for each subjectโ€”an alpha matte and color image that includes the subject along with all its related time-varying scene elements that produces omnimattes automatically for arbitrary objects and a variety of effects.

Unwrap mosaics: a new representation for video editing

A new representation for video is introduced which facilitates a number of common editing tasks and is designed to be easy to recover from a priori unseen and uncalibrated footage.

Consistent video depth estimation

An algorithm for reconstructing dense, geometrically consistent depth for all pixels in a monocular video by using a learning-based prior, i.e., a convolutional neural network trained for single-image depth estimation.

LayerBuilder: Layer Decomposition for Interactive Image and Video Color Editing

LayerBuilder is presented, an algorithm that decomposes an image or video into a linear combination of colored layers to facilitate color-editing applications and shows how this representation can benefit other applications, such as automatic recoloring suggestion, texture synthesis, and color-based filtering.

Mosaic based representations of video sequences and their applications

This paper systematically investigates how to go beyond thinking of the mosaic simply as a visualization device, but rather as a basis for efficient representation of video sequences to provide representations at multiple spatial and temporal resolutions and to handle 3D scene information.

Interactive video stylization using few-shot patch-based training

This paper demonstrates how to train an appearance translation network from scratch using only a few stylized exemplars while implicitly preserving temporal consistency and leads to a video stylization framework that supports real-time inference, parallel processing, and random access to an arbitrary output frame.

Video indexing based on mosaic representations

A new set of methods for indexing into the video sequence based on the scene-based representation, based on geometric and dynamic information contained in the video, complement the more traditional content-based indexing methods.

Learning Continuous Image Representation with Local Implicit Image Function

This paper proposes Local Implicit Image Function (LIIF), which takes an image coordinate and the 2D deep features around the coordinate as inputs, predicts the RGB value at a given coordinate as an output and builds a bridge between discrete and continuous representation in 2D.