XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model

  title={XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model},
  author={Ho Kei Cheng and Alexander G. Schwing},
. We present XMem, a video object segmentation architecture for long videos with unified feature memory stores inspired by the Atkinson-Shiffrin memory model. Prior work on video object segmentation typically only uses one type of feature memory. For videos longer than a minute, a single feature memory model tightly links memory consumption and accuracy. In contrast, following the Atkinson-Shiffrin model, we develop an architecture that incorporates multiple indepen-dent yet deeply-connected… 
2 Citations

Figures and Tables from this paper

LVOS: A Benchmark for Long-term Video Object Segmentation

A new benchmark dataset and evaluation methodology named LVOS, which consists of 220 videos with a total duration of 421 minutes, and is the best of the authors' knowledge, the first densely annotated long-term VOS dataset.

A Generalized Framework for Video Instance Segmentation

A query-based training pipeline for sequential learning, using a novel target label assignment strategy, that achieves the state-of-the-art performance on challenging benchmarks without designing complicated architectures or extra post-processing.



Hierarchical Memory Matching Network for Video Object Segmentation

A hierarchical memory matching scheme is introduced and a top-k guided memory matching module is proposed in which memory read on a fine-scale is guided by that on a coarse-scale, leading to accurate memory retrieval.

Video Object Segmentation with Episodic Graph Memory Networks

This work exploits an episodic memory network, organized as a fully connected graph, to store frames as nodes and capture cross-frame correlations by edges and yields a neat yet principled framework, which can generalize well both one-shot and zero-shot video object segmentation tasks.

Learning Position and Target Consistency for Memory-based Video Object Segmentation

LCM introduces an object-level relationship from the target to maintain target consistency, making LCM more robust to error drifting, and achieves state-of-the-art performance on both DAVIS and Youtube-VOS benchmark.

Video Object Segmentation Using Space-Time Memory Networks

This work proposes a novel solution for semi-supervised video object segmentation by leveraging memory networks and learning to read relevant information from all available sources to better handle the challenges such as appearance changes and occlussions.

Efficient Regional Memory Network for Video Object Segmentation

The proposed RM-Net effectively alleviates the ambiguity of similar objects in both memory and query frames, which allows the information to be passed from the regional memory to the query region efficiently and effectively.

Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

A simple yet effective approach to modeling space-time correspondences in the context of video object segmentation that achieves new state-of-the-art results on both DAVIS and YouTubeVOS datasets while running significantly faster at 20+ FPS for multiple objects without bells and whistles.

Fast Video Object Segmentation using the Global Context Module

A real-time, high-quality semi-supervised video object segmentation algorithm that effectively summarizes and propagates information through the entire video, and uses constant memory regardless of the video length and costs substantially less memory and computation.

Kernelized Memory Network for Video Object Segmentation

A kernelized memory network (KMN) is proposed that surpasses the state-of-the-art on standard benchmarks by a significant margin and uses the Hide-and-Seek strategy in pre-training to obtain the best possible results in handling occlusions and segment boundary extraction.

SwiftNet: Real-time Video Object Segmentation

In this work we present SwiftNet for real-time semisupervised video object segmentation (one-shot VOS), which reports 77.8% $\mathcal{J}\& \mathcal{F}$ and 70 FPS on DAVIS 2017 validation dataset,

RVOS: End-To-End Recurrent Network for Video Object Segmentation

This work proposes a Recurrent network for multiple object Video Object Segmentation (RVOS) that is fully end-to-end trainable and achieves faster inference runtimes than previous methods, reaching 44ms/frame on a P100 GPU.