Video Instance Segmentation

@article{Yang2019VideoIS,
  title={Video Instance Segmentation},
  author={Linjie Yang and Yuchen Fan and Ning Xu},
  journal={2019 IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2019},
  pages={5187-5196}
}
  • Linjie Yang, Yuchen Fan, N. Xu
  • Published 12 May 2019
  • Computer Science
  • 2019 IEEE/CVF International Conference on Computer Vision (ICCV)
In this paper we present a new computer vision task, named video instance segmentation. [] Key Method To facilitate research on this new task, we propose a large-scale benchmark called YouTube-VIS, which consists of 2883 high-resolution YouTube videos, a 40-category label set and 131k high-quality instance masks. In addition, we propose a novel algorithm called MaskTrack R-CNN for this task.

Figures and Tables from this paper

Dual Embedding Learning for Video Instance Segmentation
TLDR
A novel framework to generate high-quality segmentation results in a two-stage style, aiming at video instance segmentation task which requires simultaneous detection, segmentation and tracking of instances.
Limited Sampling Reference Frame for MaskTrack R-CNN
TLDR
A refinement model can be well used to detect and segment instances, which acquires a better track accuracy in long videos, and is applied to Stochastic Weights Aver-aging training strategy to get a better result.
Video Instance Segmentation with a Propose-Reduce Paradigm
TLDR
This work proposes a new paradigm – Propose-Reduce, to generate complete sequences for input videos by a single step, and builds a sequence propagation head on the existing image-level instance segmentation network for long-term propagation.
MSN: Efficient Online Mask Selection Network for Video Instance Segmentation
TLDR
This work presents a novel solution for Video Instance Segmentation (VIS), that is automatically generating instance level segmentation masks along with object class and tracking them in a video using the Mask Selection Network (MSN).
Temporal Feature Augmented Network for Video Instance Segmentation
TLDR
A track head is added to an instance segmentation network to track object instances across frames to make better use of the rich information contained in the video.
Simple Video Instance Segmentation with ResNet and Transformer
TLDR
Based on recent VIS model VisTR, a competitive VIS model called SimpleVTR is proposed, which trade off and optimizes the computing resources and effects of end-to-end video instance segmentation algorithm.
Spatio-Temporal Attention Network for Video Instance Segmentation
TLDR
The spatio-temporal attention network can estimate the global correlation map between the successive frames and transfers it to the attention map and added with the attention information, the new features may enhance the response of the instance for pre-defined categories.
Video Panoptic Segmentation
TLDR
A novel video panoptic segmentation network (VPSNet) which jointly predicts object classes, bounding boxes, masks, instance id tracking, and semantic segmentation in video frames is proposed.
Classifying, Segmenting, and Tracking Object Instances in Video with Mask Propagation
TLDR
The method, named MaskProp, adapts the popular Mask R-CNN to video by adding a mask propagation branch that propagates frame-level object instance masks from each video frame to all the other frames in a video clip to predict clip-level instance tracks with respect to the object instances segmented in the middle frame of the clip.
Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation
TLDR
This work identifies the mask quality due to temporal stability as a performance bottleneck and proposes a video instance segmentation method that alleviates the problem due to missing detections, by leveraging temporal context using inter-frame attentions.
...
...

References

SHOWING 1-10 OF 45 REFERENCES
One-Shot Video Object Segmentation
TLDR
One-Shot Video Object Segmentation (OSVOS), based on a fully-convolutional neural network architecture that is able to successively transfer generic semantic information, learned on ImageNet, to the task of foreground segmentation, and finally to learning the appearance of a single annotated object of the test sequence (hence one-shot).
YouTube-VOS: A Large-Scale Video Object Segmentation Benchmark
TLDR
A new large-scale video object segmentation dataset called YouTube Video Object Segmentation dataset (YouTube-VOS) is built which aims to establish baselines for the development of new algorithms in the future.
Learning Video Object Segmentation from Static Images
TLDR
It is demonstrated that highly accurate object segmentation in videos can be enabled by using a convolutional neural network (convnet) trained with static images only, and a combination of offline and online learning strategies are used.
Blazingly Fast Video Object Segmentation with Pixel-Wise Metric Learning
TLDR
The proposed method supports different kinds of user input such as segmentation mask in the first frame (semi-supervised scenario), or a sparse set of clicked points (interactive scenario), and reaches comparable quality to competing methods with much less interaction.
The 2017 DAVIS Challenge on Video Object Segmentation
TLDR
The scope of the benchmark, the main characteristics of the dataset, the evaluation metrics of the competition, and a detailed analysis of the results of the participants to the challenge are described.
Efficient Video Object Segmentation via Network Modulation
TLDR
This work proposes a novel approach that uses a single forward pass to adapt the segmentation model to the appearance of a specific object and is 70× faster than fine-tuning approaches and achieves similar accuracy.
A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation
TLDR
This work presents a new benchmark dataset and evaluation methodology for the area of video object segmentation, named DAVIS (Densely Annotated VIdeo Segmentation), and provides a comprehensive analysis of several state-of-the-art segmentation approaches using three complementary metrics.
Low-Latency Video Semantic Segmentation
TLDR
A framework for video semantic segmentation is developed, which incorporates two novel components: a feature propagation module that adaptively fuses features over time via spatially variant convolution, thus reducing the cost of per-frame computation and an adaptive scheduler that dynamically allocate computation based on accuracy prediction.
Low-Latency Video Semantic Segmentation
TLDR
A framework for video semantic segmentation is developed, which incorporates two novel components: a feature propagation module that adaptively fuses features over time via spatially variant convolution, thus reducing the cost of per-frame computation; and an adaptive scheduler that dynamically allocate computation based on accuracy prediction.
Instance-Aware Semantic Segmentation via Multi-task Network Cascades
  • Jifeng Dai, Kaiming He, Jian Sun
  • Computer Science
    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
TLDR
This paper presents Multitask Network Cascades for instance-aware semantic segmentation, which consists of three networks, respectively differentiating instances, estimating masks, and categorizing objects, and develops an algorithm for the nontrivial end-to-end training of this causal, cascaded structure.
...
...