Video Object Segmentation with Joint Re-identification and Attention-Aware Mask Propagation

  title={Video Object Segmentation with Joint Re-identification and Attention-Aware Mask Propagation},
  author={Xiaoxiao Li and Chen Change Loy},
The problem of video object segmentation can become extremely challenging when multiple instances co-exist. While each instance may exhibit large scale and pose variations, the problem is compounded when instances occlude each other causing failures in tracking. In this study, we formulate a deep recurrent network that is capable of segmenting and tracking objects in video simultaneously by their temporal continuity, yet able to re-identify them when they re-appear after a prolonged occlusion… 
OVSNet : Towards One-Pass Real-Time Video Object Segmentation
An unified One-Pass Video Segmentation framework (OVS-Net) for modeling spatial-temporal representation in a unified pipeline, which seamlessly integrates object detection, object segmentation, and object re-identification.
Mask-Ranking Network for Semi-supervised Video Object Segmentation
A novel architecture named Mask-Ranking Network(MRNet), which takes advantage of both the propagation- based method and the matching-based method to address the above problem of video object segmentation and can better handle the deformation of the objects, and make the segmentation result more accurate.
Bilateral Temporal Re-Aggregation for Weakly-Supervised Video Object Segmentation
This paper proposes to capture the temporal dependencies and gather information from multiple frames through bilateral temporal re-aggregation, and builds an efficient and competent aggregation process that can fully exploit the video context to make the inference.
Video Object Segmentation Based on Location and RoIAlign in Weight Modulated Multi-Scale Network
This work proposes a fast and effective video object segmentation method which does not rely on fine-tuning, and extracts the semantic information of the annotation object in the first frame to generate corresponding channel-wise weights so as to re-target the network to locate and segment the specific object accurately.
Fast Video Object Segmentation With Temporal Aggregation Network and Dynamic Template Matching
“tracking-by-detection” is introduced into VOS which can coherently integrates segmentation into tracking, by proposing a new temporal aggregation network and a novel dynamic time-evolving template matching mechanism to achieve significantly improved performance.
Video Object Segmentation Using Global and Instance Embedding Learning
The proposed network learns to differentiate multiple instances and associate them properly in one feed-forward manner through using the relation among different instances per-frame as well as temporal relation across different frames.
Mask Selection and Propagation for Unsupervised Video Object Segmentation
This work introduces a novel idea of assessing mask quality using a neural network called Selector Net, trained is such way that it is generalizes across various datasets and is able to limit the noise accumulated along the video.
Learning Fast and Robust Target Models for Video Object Segmentation
This work proposes a novel VOS architecture consisting of two network components, exclusively trained offline, designed to process the coarse scores into high quality segmentation masks, and achieves favorable performance, while operating at higher frame-rates compared to state-of-the-art.
Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation
This work proposes a proposed unified framework consisting of object proposal, tracking and segmentation components that achieves the state-of-the-art performance on several video object segmentation benchmarks.
LIP: Learning Instance Propagation for Video Object Segmentation
This paper proposes a single end-to-end trainable deep neural network, convolutional gated recurrent Mask-RCNN, for tackling the semi-supervised VOS task, which takes advantage of both the instance segmentation network (Mask- RCNN) and the visual memory module (Conv-GRU).


Instance Re-Identification Flow for Video Object Segmentation
An Instance Re-Identification Flow (IRIF) for video object segmentation and multi-SVM classifiers embedding history reference with several unary components, namely, saliency, CNN features, location and color, to segment each object instance within its possible bounding box in each frame is proposed.
Video Object Segmentation with Re-identification
The Video Object Segmentation with Re-identification (VS-ReID) model includes a mask propagation module and a ReID module that produces an initial probability map by flow warping and retrieves missing instances by adaptive matching.
Key-segments for video object segmentation
The method first identifies object-like regions in any frame according to both static and dynamic cues and compute a series of binary partitions among candidate “key-segments” to discover hypothesis groups with persistent appearance and motion.
Online Adaptation of Convolutional Neural Networks for Video Object Segmentation
Online Adaptive Video Object Segmentation (OnAVOS) is proposed which updates the network online using training examples selected based on the confidence of the network and the spatial configuration and adds a pretraining step based on objectness, which is learned on PASCAL.
Supervoxel-Consistent Foreground Propagation in Video
This work proposes a higher order supervoxel label consistency potential for semi-supervised foreground segmentation, leveraging bottom-up supervoxels to guide its estimates towards long-range coherent regions.
Learning Video Object Segmentation from Static Images
It is demonstrated that highly accurate object segmentation in videos can be enabled by using a convolutional neural network (convnet) trained with static images only, and a combination of offline and online learning strategies are used.
Track and Segment: An Iterative Unsupervised Approach for Video Object Proposals
  • Fanyi XiaoYong Jae Lee
  • Computer Science
    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
This paper presents an unsupervised approach that generates a diverse, ranked set of bounding box and segmentation video object proposals-spatio-temporal tubes that localize the foreground objects-in an unannotated video, and demonstrates state-of-the-art segmentation results on the SegTrack v2 dataset.
Video Segmentation by Tracking Many Figure-Ground Segments
An unsupervised video segmentation approach by simultaneously tracking multiple holistic figure-ground segments that outperforms state-of-the-art approaches in the dataset, showing its efficiency and robustness to challenges in different video sequences.
Pixel-Level Matching for Video Object Segmentation Using Convolutional Neural Networks
We propose a novel video object segmentation algorithm based on pixel-level matching using Convolutional Neural Networks (CNN). Our network aims to distinguish the target area from the background on
Video Segmentation via Object Flow
This work forms a principled, multiscale, spatio-temporal objective function that uses optical flow to propagate information between frames for video segmentation and demonstrates the effectiveness of jointly optimizing optical flow and video segmentations using an iterative scheme.