Short-Term and Long-Term Context Aggregation Network for Video Inpainting

  title={Short-Term and Long-Term Context Aggregation Network for Video Inpainting},
  author={Ang Li and Shanshan Zhao and Xingjun Ma and Mingming Gong and Jianzhong Qi and Rui Zhang and Dacheng Tao and Ramamohanarao Kotagiri},
Video inpainting aims to restore missing regions of a video and has many applications such as video editing and object removal. However, existing methods either suffer from inaccurate short-term context aggregation or rarely explore long-term frame information. In this work, we present a novel context aggregation network to effectively exploit both short-term and long-term frame information for video inpainting. In the encoding stage, we propose boundary-aware short-term context aggregation… Expand

Figures and Tables from this paper

Internal Video Inpainting by Implicit Long-range Propagation
This work proposes a novel framework for video inpainting by adopting an internal learning strategy that can be achieved implicitly by fitting a convolutional neural network to the known region to handle challenging sequences with ambiguous backgrounds or long-term occlusion. Expand
Decoupled Spatial-Temporal Transformer for Video Inpainting
This work proposes a novel Decoupled Spatial-Temporal Transformer (DSTT) for improving video inpainting with exceptional efficiency and achieves better performance than state-of-theart video inPainting approaches with significant boosted efficiency. Expand
FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting
  • Rui Liu, Hanming Deng, +6 authors Hongsheng Li
  • Computer Science
  • ArXiv
  • 2021
This work proposes FuseFormer, a Transformer model designed for video inpainting via fine-grained feature fusion based on novel Soft Split and Soft Composition operations, which surpasses state-of-the-art methods in both quantitative and qualitative evaluations. Expand
Flow-Guided Video Inpainting with Scene Templates
A generative model of images in relation to the scene (without missing regions) and mappings from the scene to images is introduced, which ensures consistency of the frame-toframe flows generated to the underlying scene, reducing geometric distortions in flow based inpainting. Expand
Deep Face Video Inpainting via UV Mapping
This paper proposes a two-stage deep learning method for face video inpainting that can significantly outperform methods based merely on 2D information, especially for faces under large pose and expression variations. Expand
Hybridized Cuckoo Search with Multi-Verse Optimization-Based Patch Matching and Deep Learning Concept for Enhancing Video Inpainting
The hybridization of two meta-heuristic algorithms like cuckoo search algorithm and multi-verse optimization (MVO) called Cuckoo Search-based MVO is used to optimize the RNN and patch matching and the experimental results verify the reliability of the proposed algorithm over existing algorithms. Expand
Backdoor Attack with Sample-Specific Triggers
Inspired by the recent advance in DNN-based image steganography, sample-specific invisible additive noises as backdoor triggers are generated by encoding an attacker-specified string into benign images through an encoder-decoder network. Expand
Progressive Temporal Feature Alignment Network for Video Inpainting
‘Progressive Temporal Feature Alignment Network’ is proposed, which progressively enriches features extracted from the current frame with the feature warped from neighbouring frames using optical flow, greatly improving visual quality and temporal consistency of the inpainted videos. Expand


Copy-and-Paste Networks for Deep Video Inpainting
A novel DNN-based framework called the Copy-and-Paste Networks for video inpainting that takes advantage of additional information in other frames of the video to significantly improve the lane detection accuracy on road videos. Expand
Deep Video Inpainting
This work proposes a novel deep network architecture for fast video inpainting built upon an image-based encoder-decoder model that is designed to collect and refine information from neighbor frames and synthesize still-unknown regions. Expand
Video Inpainting by Jointly Learning Temporal Structure and Spatial Details
A novel deep learning architecture is proposed which contains two subnetworks: a temporal structure inference network and a spatial detail recovering network, which jointly trains both sub-networks in an end-to-end manner. Expand
Deep Flow-Guided Video Inpainting
This work first synthesizes a spatially and temporally coherent optical flow field across video frames using a newly designed Deep Flow Completion network, then uses the synthesized flow fields to guide the propagation of pixels to fill up the missing regions in the video. Expand
Learning Blind Video Temporal Consistency
An efficient approach based on a deep recurrent network for enforcing temporal consistency in a video that can handle multiple and unseen tasks, including but not limited to artistic style transfer, enhancement, colorization, image-to-image translation and intrinsic image decomposition. Expand
Learning Pyramid-Context Encoder Network for High-Quality Image Inpainting
This paper proposes a Pyramid-context Encoder Network for image inpainting by deep generative models, built upon a U-Net structure with three tailored components, ie. Expand
High-Resolution Image Inpainting Using Multi-scale Neural Patch Synthesis
This work proposes a multi-scale neural patch synthesis approach based on joint optimization of image content and texture constraints, which not only preserves contextual structures but also produces high-frequency details by matching and adapting patches with the most similar mid-layer feature correlations of a deep classification network. Expand
Free-Form Video Inpainting With 3D Gated Convolution and Temporal PatchGAN
A deep learning based free-form video inpainting model is introduced, with proposed 3D gated convolutions to tackle the uncertainty offree-form masks and a novel Temporal PatchGAN loss to enhance temporal consistency. Expand
An Internal Learning Approach to Video Inpainting
We propose a novel video inpainting algorithm that simultaneously hallucinates missing appearance and motion (optical flow) information, building upon the recent 'Deep Image Prior' (DIP) thatExpand
Generative Image Inpainting with Contextual Attention
This work proposes a new deep generative model-based approach which can not only synthesize novel image structures but also explicitly utilize surrounding image features as references during network training to make better predictions. Expand