Accelerating Video Object Segmentation with Compressed Video

  title={Accelerating Video Object Segmentation with Compressed Video},
  author={Kai-yu Xu and Angela Yao},
  journal={2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  • Kai-yu XuAngela Yao
  • Published 26 July 2021
  • Computer Science
  • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
We propose an efficient plug-and-play acceleration framework for semi-supervised video object segmentation by exploiting the temporal redundancies in videos presented by the compressed bitstream. Specifically, we propose a motion vector-based warping method for propagating segmentation masks from keyframes to other frames in a bidirectional and multi-hop manner. Additionally, we introduce a residual-based correction module that can fix wrongly propagated segmentation masks from noisy or… 

Figures from this paper



Real Time Video Object Segmentation in Compressed Domain

A propagation based segmentation method in compressed domain to accelerate inference speed and proposes a residual supplement module to supplement appearance information which is lost in direct warping and a spatial attention module that can mine extra spatial saliency to provide the location information of the specified object.

Fast Video Object Segmentation by Reference-Guided Mask Propagation

A deep Siamese encoder-decoder network is proposed that is designed to take advantage of mask propagation and object detection while avoiding the weaknesses of both approaches, and achieves accuracy competitive with state-of-the-art methods while running in a fraction of time compared to others.

Fast Object Detection in Compressed Video

This paper proposes a fast object detection method by taking advantage of both motion vectors and residual errors that are freely available in video streams and is the first work that investigates a deep convolutional detector on compressed videos.

Video Object Segmentation without Temporal Information

Semantic One-Shot Video Object Segmentation is presented, based on a fully-convolutional neural network architecture that is able to successively transfer generic semantic information, learned on ImageNet, to the task of foreground segmentation, and finally to learning the appearance of a single annotated object of the test sequence (hence one shot).

Learning Video Object Segmentation from Static Images

It is demonstrated that highly accurate object segmentation in videos can be enabled by using a convolutional neural network (convnet) trained with static images only, and a combination of offline and online learning strategies are used.

Efficient Video Semantic Segmentation with Labels Propagation and Refinement

The proposed Efficient Video Segmentation (EVS) pipeline achieves accuracy levels competitive to the existing real-time methods for semantic image segmentation (mIoU above 60%), while achieving much higher frame rates.

Compressed Domain Video Object Segmentation

This work presents a compressed domain video object segmentation method for the MPEG encoded video sequences that generates accurate segmentation maps in block resolution at hierarchically varying object levels, which empowers application to determine the most pertinent partition of images.

Video object segmentation: a compressed domain approach

A method for automatically estimating the number of objects and extracting independently moving video objects using motion vectors is presented here and a strategy for edge refinement is proposed to extract the precise object boundaries.

Low-Latency Video Semantic Segmentation

A framework for video semantic segmentation is developed, which incorporates two novel components: a feature propagation module that adaptively fuses features over time via spatially variant convolution, thus reducing the cost of per-frame computation and an adaptive scheduler that dynamically allocate computation based on accuracy prediction.

PReMVOS: Proposal-generation, Refinement and Merging for Video Object Segmentation

This work addresses semi-supervised video object segmentation, the task of automatically generating accurate and consistent pixel masks for objects in a video sequence, given the first-frame ground truth annotations, with the PReMVOS algorithm.