Corpus ID: 236447698

Self-Supervised Video Object Segmentation by Motion-Aware Mask Propagation

  title={Self-Supervised Video Object Segmentation by Motion-Aware Mask Propagation},
  author={Bo Miao and Mohammed Bennamoun and Yongsheng Gao and Ajmal S. Mian},
We propose a self-supervised spatio-temporal matching method coined Motion-Aware Mask Propagation (MAMP) for semi-supervised video object segmentation. During training, MAMP leverages the frame reconstruction task to train the model without the need for annotations. During inference, MAMP extracts high-resolution features from each frame to build a memory bank from the features as well as the predicted masks of selected past frames. MAMP then propagates the masks from the memory bank to… Expand

Figures and Tables from this paper


Kernelized Memory Network for Video Object Segmentation
A kernelized memory network (KMN) is proposed that surpasses the state-of-the-art on standard benchmarks by a significant margin and uses the Hide-and-Seek strategy in pre-training to obtain the best possible results in handling occlusions and segment boundary extraction. Expand
RPM-Net: Robust Pixel-Level Matching Networks for Self-Supervised Video Object Segmentation
Robust Pixel-level Matching Networks (RPM-Net) is presented, a novel deep architecture that matches pixels between adjacent frames, using only color information from unlabeled videos for training, which significantly reduces the performance gap between self-supervised and fully- supervised video object segmentation. Expand
PReMVOS: Proposal-generation, Refinement and Merging for Video Object Segmentation
This work addresses semi-supervised video object segmentation, the task of automatically generating accurate and consistent pixel masks for objects in a video sequence, given the first-frame ground truth annotations, with the PReMVOS algorithm. Expand
Online Adaptation of Convolutional Neural Networks for Video Object Segmentation
Online Adaptive Video Object Segmentation (OnAVOS) is proposed which updates the network online using training examples selected based on the confidence of the network and the spatial configuration and adds a pretraining step based on objectness, which is learned on PASCAL. Expand
Video Object Segmentation with Joint Re-identification and Attention-Aware Mask Propagation
A deep recurrent network that is capable of segmenting and tracking objects in video simultaneously by their temporal continuity, yet able to re-identify them when they re-appear after a prolonged occlusion is formulated. Expand
Self-supervised Learning for Video Correspondence Flow
A simple information bottleneck is introduced that forces the model to learn robust features for correspondence matching, and prevents it from learning trivial solutions, as well as probing the upper bound by training on additional data, further demonstrating significant improvements on video segmentation. Expand
Blazingly Fast Video Object Segmentation with Pixel-Wise Metric Learning
The proposed method supports different kinds of user input such as segmentation mask in the first frame (semi-supervised scenario), or a sparse set of clicked points (interactive scenario), and reaches comparable quality to competing methods with much less interaction. Expand
Fast Video Object Segmentation by Reference-Guided Mask Propagation
A deep Siamese encoder-decoder network is proposed that is designed to take advantage of mask propagation and object detection while avoiding the weaknesses of both approaches, and achieves accuracy competitive with state-of-the-art methods while running in a fraction of time compared to others. Expand
Video Object Segmentation Using Space-Time Memory Networks
This work proposes a novel solution for semi-supervised video object segmentation by leveraging memory networks and learning to read relevant information from all available sources to better handle the challenges such as appearance changes and occlussions. Expand
Learning Video Object Segmentation from Static Images
It is demonstrated that highly accurate object segmentation in videos can be enabled by using a convolutional neural network (convnet) trained with static images only, and a combination of offline and online learning strategies are used. Expand