Decoupled Spatial-Temporal Transformer for Video Inpainting
@article{Liu2021DecoupledST, title={Decoupled Spatial-Temporal Transformer for Video Inpainting}, author={R. Liu and Hanming Deng and Yangyi Huang and Xiaoyu Shi and Lewei Lu and Wenxiu Sun and Xiaogang Wang and Jifeng Dai and Hongsheng Li}, journal={ArXiv}, year={2021}, volume={abs/2104.06637} }
Video inpainting aims to fill the given spatiotemporal holes with realistic appearance but is still a challenging task even with prosperous deep learning approaches. Recent works introduce the promising Transformer architecture into deep video inpainting and achieve better performance. However, it still suffers from synthesizing blurry texture as well as huge computational cost. Towards this end, we propose a novel Decoupled Spatial-Temporal Transformer (DSTT) for improving video inpainting…
Figures and Tables from this paper
7 Citations
Spatial-Temporal Residual Aggregation for High Resolution Video Inpainting
- Computer ScienceArXiv
- 2021
STRA-Net is proposed, a novel spatial-temporal residual aggregation framework for high resolution video inpainting that can produce more temporal-coherent and visually appealing results than the state-of-the-art methods on inPainting high resolution videos.
A Temporal Learning Approach to Inpainting Endoscopic Specularities and Its effect on Image Correspondence
- Computer ScienceArXiv
- 2022
This paper proposes using a temporal generative adversarial network (GAN) to inpaint the hidden anatomy under specularities, inferring its appearance spatially and from neighbouring frames where they are not present in the same location.
Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition
- Computer ScienceArXiv
- 2021
Seamless combination of these novel designs forms a robust spatialtemporal representation and achieves better performance than state-of-the-art methods on four public motion datasets.
Visual Attention Network
- Computer ScienceArXiv
- 2022
A novel large kernel attention (LKA) module is proposed to enable self-adaptive and long-range correlations in self-attention while avoiding the above issues and a novel neural network based on LKA is introduced, namely Visual Attention Network (VAN).
Attention Mechanisms in Computer Vision: A Survey
- Computer ScienceComput. Vis. Media
- 2022
This survey provides a comprehensive review of various attention mechanisms in computer vision and categorize them according to approach, such as channel attention, spatial attention, temporal attention, and branch attention.
MABC‐EPF: Video in‐painting technique with enhanced priority function and optimal patch search algorithm
- Computer ScienceConcurr. Comput. Pract. Exp.
- 2022
The proposed video in‐painting technique with enhanced priority function and optimal patch searching algorithm works better than the other art state applications and the proposed peak signal to noise ratio (PSNR) attains better 8.06%, 7.90%, 32.15%, and 13.06% compared with other previous methods.
Towards An End-to-End Framework for Flow-Guided Video Inpainting
- Computer ScienceArXiv
- 2022
This work proposes an End-to-End framework for Flow-Guided Video Inpainting (E 2 FGVI) through elaborately designed three trainable modules, namely, flow completion, feature propagation, and content hallucination modules, which can be jointly optimized, leading to a more efficient and effective inpainting process.
References
SHOWING 1-10 OF 41 REFERENCES
Learning Joint Spatial-Temporal Transformations for Video Inpainting
- Computer ScienceECCV
- 2020
This paper simultaneously fill missing regions in all input frames by self-attention, and proposes to optimize STTN by a spatial-temporal adversarial loss to show the superiority of the proposed model.
Deep Video Inpainting
- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019
This work proposes a novel deep network architecture for fast video inpainting built upon an image-based encoder-decoder model that is designed to collect and refine information from neighbor frames and synthesize still-unknown regions.
Deep Flow-Guided Video Inpainting
- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019
This work first synthesizes a spatially and temporally coherent optical flow field across video frames using a newly designed Deep Flow Completion network, then uses the synthesized flow fields to guide the propagation of pixels to fill up the missing regions in the video.
Learnable Gated Temporal Shift Module for Deep Video Inpainting
- Computer Science
- 2019
This paper presents a novel component termed Learnable Gated Temporal Shift Module (LGTSM) for video inpainting models that could effectively tackle arbitrary video masks without additional parameters from 3D convolutions.
Video Inpainting by Jointly Learning Temporal Structure and Spatial Details
- Computer ScienceAAAI
- 2019
A novel deep learning architecture is proposed which contains two subnetworks: a temporal structure inference network and a spatial detail recovering network, which jointly trains both sub-networks in an end-to-end manner.
Copy-and-Paste Networks for Deep Video Inpainting
- Computer Science2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2019
A novel DNN-based framework called the Copy-and-Paste Networks for video inpainting that takes advantage of additional information in other frames of the video to significantly improve the lane detection accuracy on road videos.
An Internal Learning Approach to Video Inpainting
- Computer Science2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2019
We propose a novel video inpainting algorithm that simultaneously hallucinates missing appearance and motion (optical flow) information, building upon the recent 'Deep Image Prior' (DIP) that…
Generative Image Inpainting with Contextual Attention
- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
This work proposes a new deep generative model-based approach which can not only synthesize novel image structures but also explicitly utilize surrounding image features as references during network training to make better predictions.
Proposal-Based Video Completion
- Computer ScienceECCV
- 2020
This paper uses 3D convolutions to obtain an initial inpainting estimate which is subsequently refined by fusing a generated set of proposals which provide a rich source of information that permits combining similarly looking patches that may be spatially and temporally far from the region to be inpainted.
Learning Blind Video Temporal Consistency
- Computer ScienceECCV
- 2018
An efficient approach based on a deep recurrent network for enforcing temporal consistency in a video that can handle multiple and unseen tasks, including but not limited to artistic style transfer, enhancement, colorization, image-to-image translation and intrinsic image decomposition.