Region Aware Video Object Segmentation with Deep Motion Modeling

  title={Region Aware Video Object Segmentation with Deep Motion Modeling},
  author={Bo Miao and Bennamoun and Yongsheng Gao and Ajmal S. Mian},
—Current semi-supervised video object segmentation (VOS) methods usually leverage the entire features of one frame to predict object masks and update memory. This introduces significant redundant computations. To reduce redundancy, we present a Region Aware Video Object Segmentation (RAVOS) approach that predicts regions of interest (ROIs) for efficient object segmentation and memory storage. RAVOS includes a fast object motion tracker to predict their ROIs in the next frame. For efficient… 

Delving Deeper Into Mask Utilization in Video Object Segmentation

A new architecture named MaskVOS is formulated, which sufficiently exploits the mask benefits for VOS, and proposes a new mask-enhanced matcher to reduce the background distraction and enhance the locality of the matching process.

Improving Unsupervised Video Object Segmentation with Motion-Appearance Synergy

IMAS greatly improves the segmentation quality on several common UVOS benchmarks and surpasses previous methods by 8.3% on DAVIS16 benchmark with only standard ResNet and convolutional heads.



MoNet: Deep Motion Exploitation for Video Object Segmentation

A novel MoNet model to deeply exploit motion cues for boosting video object segmentation performance from two aspects, i.e., frame representation learning and segmentation refinement, provides new state-of-the-art performance on three competitive benchmark datasets.

Kernelized Memory Network for Video Object Segmentation

A kernelized memory network (KMN) is proposed that surpasses the state-of-the-art on standard benchmarks by a significant margin and uses the Hide-and-Seek strategy in pre-training to obtain the best possible results in handling occlusions and segment boundary extraction.

Video Object Segmentation Using Space-Time Memory Networks

This work proposes a novel solution for semi-supervised video object segmentation by leveraging memory networks and learning to read relevant information from all available sources to better handle the challenges such as appearance changes and occlussions.

Fast and Accurate Online Video Object Segmentation via Tracking Parts

This paper proposes a fast and accurate video object segmentation algorithm that can immediately start the segmentation process once receiving the images, and performs favorably against state-of-the-art algorithms in accuracy on the DAVIS benchmark dataset, while achieving much faster runtime performance.

Learning Video Object Segmentation from Static Images

It is demonstrated that highly accurate object segmentation in videos can be enabled by using a convolutional neural network (convnet) trained with static images only, and a combination of offline and online learning strategies are used.

VideoMatch: Matching based Video Object Segmentation

This work develops a novel matching based algorithm for video object segmentation that learns to match extracted features to a provided template without memorizing the appearance of the objects.

Online Adaptation of Convolutional Neural Networks for Video Object Segmentation

Online Adaptive Video Object Segmentation (OnAVOS) is proposed which updates the network online using training examples selected based on the confidence of the network and the spatial configuration and adds a pretraining step based on objectness, which is learned on PASCAL.

Learning Fast and Robust Target Models for Video Object Segmentation

This work proposes a novel VOS architecture consisting of two network components, exclusively trained offline, designed to process the coarse scores into high quality segmentation masks, and achieves favorable performance, while operating at higher frame-rates compared to state-of-the-art.

Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised Video Object Segmentation

A novel dynamic network is proposed that estimates change across frames and decides which path to choose – computing a full network or reusing previous frame’s feature – to choose depending on the expected similarity.

RVOS: End-To-End Recurrent Network for Video Object Segmentation

This work proposes a Recurrent network for multiple object Video Object Segmentation (RVOS) that is fully end-to-end trainable and achieves faster inference runtimes than previous methods, reaching 44ms/frame on a P100 GPU.