Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised Video Object Segmentation

  title={Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised Video Object Segmentation},
  author={Hyojin Park and Jayeon Yoo and Seohyeong Jeong and Ganesh Venkatesh and Nojun Kwak},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  • Hyojin ParkJayeon Yoo Nojun Kwak
  • Published 22 December 2020
  • Computer Science
  • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Current state-of-the-art approaches for Semi-supervised Video Object Segmentation (Semi-VOS) propagates information from previous frames to generate segmentation mask for the current frame. This results in high-quality segmentation across challenging scenarios such as changes in appearance and occlusion. But it also leads to unnecessary computations for stationary or slow-moving objects where the change across frames is minimal. In this work, we exploit this observation by using temporal… 

Video Object of Interest Segmentation

This work constructs a large-scale dataset called LiveVideos, which contains 2418 pairs of target images and live videos with instance-level annotations and designs a transformer-based method to fuse video and image features.

LVOS: A Benchmark for Long-term Video Object Segmentation

A new benchmark dataset and evaluation methodology named LVOS, which consists of 220 videos with a total duration of 421 minutes, and is the best of the authors' knowledge, the first densely annotated long-term VOS dataset.

Visual Semantic Segmentation Based on Few/Zero-Shot Learning: An Overview

This paper focuses on the recently published few/zero-shot visual semantic segmentation methods varying from 2D to 3D space and explores the commonalities and discrepancies of technical settlements under different segmentation circumstances.

Adaptive Online Mutual Learning Bi-Decoders for Video Object Segmentation

An adaptive online framework for VOS with bi-decoders mutual learning that adapts to the challenging scenarios including unseen categories, object deformation, and appearance variation during inference and demonstrates the superiority of the proposed model over state-of-the-art methods.

Global Spectral Filter Memory Network for Video Object Segmentation

This paper proposes Global Spectral Filter Memory network (GSFM), which improves intra-frame interaction through learning long-term spatial dependencies in the spectral domain and proposes Low (High) Frequency Module, which is proposed to fit this circumstance.

Depth Quality-Inspired Feature Manipulation for Efficient RGB-D and Video Salient Object Detection

An efficient depth quality-inspired feature manipulation (DQFM) process, which can dynamically change depth features according to depth quality, and maintains state-of-the-art accuracy when even compared to non-eficient models.

Region Aware Video Object Segmentation with Deep Motion Modeling

A Region Aware Video Object Segmentation (RAVOS) approach that predicts regions of interest (ROIs) for efficient object segmentation and memory storage, and proposes motion path memory to wipe out redundant context by memorizing the features within the motion path of objects between two frames.

Learning Quality-aware Dynamic Memory for Video Object Segmentation

This work proposes a QDMN to evaluate the segmentation quality of each frame, allowing the memory bank to selectively store accurately segmented frames to prevent the error accumulation problem and significantly improves performance.

XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model

XMem is presented, a video object segmentation architecture for long videos with unified feature memory stores inspired by the Atkinson-Shiffrin memory model that greatly exceeds state-of-the-art performance on long-video datasets while being on par with state- of theart methods (that do not work on long videos) on short- video datasets.

Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation

A Language-Bridged Duplex Transfer (LBDT) module is proposed which utilizes language as an intermediary bridge to accomplish explicit and adaptive spatial-temporal interaction earlier in the encoding phase.



A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation

This work presents a new benchmark dataset and evaluation methodology for the area of video object segmentation, named DAVIS (Densely Annotated VIdeo Segmentation), and provides a comprehensive analysis of several state-of-the-art segmentation approaches using three complementary metrics.

Learning Fast and Robust Target Models for Video Object Segmentation

This work proposes a novel VOS architecture consisting of two network components, exclusively trained offline, designed to process the coarse scores into high quality segmentation masks, and achieves favorable performance, while operating at higher frame-rates compared to state-of-the-art.

TTVOS: Lightweight Video Object Segmentation with Adaptive Template Attention Module and Temporal Consistency Loss

A novel semi-VOS model based on a temple matching method and a novel temporal consistency loss to reduce the performance gap from heavy models while expediting inference time a lot is introduced.

Convolutional Networks with Adaptive Inference Graphs

This work proposes convolutional networks with adaptive inference graphs (ConvNet-AIG) that adaptively define their network topology conditioned on the input image, and observes that ConvNet- AIG shows a higher robustness than ResNets, complementing other known defense mechanisms.

Temporally Distributed Networks for Fast Video Semantic Segmentation

We present TDNet, a temporally distributed network designed for fast and accurate video semantic segmentation. We observe that features extracted from a certain high-level layer of a deep CNN can be

Learning Dynamic Routing for Semantic Segmentation

  • Yanwei LiLin Song Jian Sun
  • Computer Science
    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020
A conceptually new method to alleviate the scale variance in semantic representation, named dynamic routing, which generates data-dependent routes, adapting to the scale distribution of each image, and compares with several static architectures, which can be modeled as special cases in the routing space.

State-Aware Tracker for Real-Time Video Object Segmentation

This work proposes a novel pipeline called State-Aware Tracker (SAT), which can produce accurate segmentation results with real-time speed and takes advantage of the inter-frame consistency and deals with each target object as a tracklet.

Fast Video Object Segmentation using the Global Context Module

A real-time, high-quality semi-supervised video object segmentation algorithm that effectively summarizes and propagates information through the entire video, and uses constant memory regardless of the video length and costs substantially less memory and computation.

SINet: Extreme Lightweight Portrait Segmentation Networks with Spatial Squeeze Modules and Information Blocking Decoder

The new extremely lightweight portrait segmentation model SINet is introduced, containing an information blocking decoder and spatial squeeze modules, and it is demonstrated that the method can be used for general semantic segmentation on the Cityscapes dataset.

Fast Video Object Segmentation via Dynamic Targeting Network

Experimental results on two public datasets demonstrate that the proposed model significantly outperforms existing methods without online training in both accuracy and efficiency, and is comparable to online training-based methods in accuracy with an order of magnitude faster speed.