Towards An End-to-End Framework for Flow-Guided Video Inpainting

@article{Li2022TowardsAE,
  title={Towards An End-to-End Framework for Flow-Guided Video Inpainting},
  author={Z. Li and Cheng Lu and Jia Qin and Chunle Guo and Mingg-Ming Cheng},
  journal={ArXiv},
  year={2022},
  volume={abs/2204.02663}
}
Optical flow, which captures motion information across frames, is exploited in recent video inpainting methods through propagating pixels along its trajectories. However, the hand-crafted flow-based processes in these methods are applied separately to form the whole inpainting pipeline. Thus, these methods are less efficient and rely heavily on the intermediate results from earlier stages. In this pa-per, we propose an End-to-End framework for Flow-Guided Video Inpainting (E 2 FGVI) through… 
3 Citations

Figures and Tables from this paper

Learning Task Agnostic Temporal Consistency Correction
TLDR
A novel general framework is proposed that learns to infer consistent motion dynamics from inconsistent videos to mitigate the temporal sensitivity of the temporalicker while preserving the perceptual quality for both the temporally neighboring and relatively distant frames.
Towards Unified Keyframe Propagation Models
TLDR
This work presents a two-stream approach, where high-frequency features interact locally and low- frequencies features interact globally, and evaluates it for inpainting tasks, where experiments show that it improves both the propagation of features within a single frame as required for image inPainting, as well as their propagation from keyframes to target frames.
PersonGONE: Image Inpainting for Automated Checkout Solution
In this paper, we present a solution for automatic checkout in a retail store as a part of AI City Challenge 2022 . We propose a novel approach that uses the “removal” of unwanted objects — in this

References

SHOWING 1-10 OF 66 REFERENCES
Image quality assessment: from error visibility to structural similarity
TLDR
A structural similarity index is developed and its promise is demonstrated through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000.
Learning Blind Video Temporal Consistency
TLDR
An efficient approach based on a deep recurrent network for enforcing temporal consistency in a video that can handle multiple and unseen tasks, including but not limited to artistic style transfer, enhancement, colorization, image-to-image translation and intrinsic image decomposition.
Video-tovideo synthesis
  • In NeurIPS,
  • 2018
FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting
TLDR
This work proposes FuseFormer, a Transformer model designed for video inpainting via fine-grained feature fusion based on novel Soft Split and Soft Composition operations, which surpasses state-of-the-art methods in both quantitative and qualitative evaluations.
Learning Joint Spatial-Temporal Transformations for Video Inpainting
TLDR
This paper simultaneously fill missing regions in all input frames by self-attention, and proposes to optimize STTN by a spatial-temporal adversarial loss to show the superiority of the proposed model.
Copy-and-Paste Networks for Deep Video Inpainting
TLDR
A novel DNN-based framework called the Copy-and-Paste Networks for video inpainting that takes advantage of additional information in other frames of the video to significantly improve the lane detection accuracy on road videos.
Learnable Gated Temporal Shift Module for Deep Video Inpainting
TLDR
This paper presents a novel component termed Learnable Gated Temporal Shift Module (LGTSM) for video inpainting models that could effectively tackle arbitrary video masks without additional parameters from 3D convolutions.
Deep Video Inpainting
TLDR
This work proposes a novel deep network architecture for fast video inpainting built upon an image-based encoder-decoder model that is designed to collect and refine information from neighbor frames and synthesize still-unknown regions.
Focal Attention for Long-Range Interactions in Vision Transformers
TLDR
A new variant of Vision Transformer models, called Focal Transformers, is built, which achieve superior performance over the state-of-the-art (SoTA) Vision Transformers on a range of public image classification and object detection benchmarks.
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
  • Ze Liu, Yutong Lin, B. Guo
  • Computer Science
    2021 IEEE/CVF International Conference on Computer Vision (ICCV)
  • 2021
TLDR
A hierarchical Transformer whose representation is computed with Shifted windows, which has the flexibility to model at various scales and has linear computational complexity with respect to image size and will prove beneficial for all-MLP architectures.
...
...