SwiftNet: Real-time Video Object Segmentation

@article{Wang2021SwiftNetRV,
  title={SwiftNet: Real-time Video Object Segmentation},
  author={Haochen Wang and Xiaolong Jiang and Haibing Ren and Yao Hu and Song Bai},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021},
  pages={1296-1305}
}
  • Haochen WangXiaolong Jiang S. Bai
  • Published 9 February 2021
  • Computer Science
  • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
In this work we present SwiftNet for real-time semisupervised video object segmentation (one-shot VOS), which reports 77.8% $\mathcal{J}\& \mathcal{F}$ and 70 FPS on DAVIS 2017 validation dataset, leading all present solutions in overall accuracy and speed performance. We achieve this by elaborately compressing spatiotemporal redundancy in matching-based VOS via Pixel-Adaptive Memory (PAM). Temporally, PAM adaptively triggers memory updates on frames where objects display noteworthy inter-frame… 

Recurrent Dynamic Embedding for Video Object Segmentation

This paper proposes a Recurrent Dynamic Embedding (RDE) to build a memory bank of constant size, explicitly generated and update by the proposed Spatio-temporal Aggregation Module (SAM), which exploits the cue of historical information.

Occluded Video Instance Segmentation: A Benchmark

A simple plug-and-play module that performs temporal feature calibration to complement missing object cues caused by occlusion is presented, and a remarkable AP improvement on the OVIS dataset is obtained.

Self-Supervised Sidewalk Perception Using Fast Video Semantic Segmentation for Robotic Wheelchairs in Smart Mobility

The method for developing a visual perception technique to perform in urban sidewalk environments for the robotic wheelchair is used and serves as a reference to transfer and develop perception algorithms for any cross-domain visual perception applications with less downtime.

SWEM: Towards Real-Time Video Object Segmentation with Sequential Weighted Expectation-Maximization

A novel Sequential Weighted Expectation-Maximization (SWEM) network to greatly reduce the redundancy of memory features and maintains a fixed number of template features in memory, which ensures the stable inference complexity of the VOS system.

MA-ResNet50: A General Encoder Network for Video Segmentation

This paper designs a Partial Channel Memory Attention module (PCMA) to store and fuse time series features from video sequences, and proposes a Memory Attention ResNet50 network (MA-ResNet50), making it the first video based feature extraction encoder appliable for most of the currently proposed segmentation networks.

Accelerating Video Object Segmentation with Compressed Video

  • Kai-yu XuAngela Yao
  • Computer Science
    2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2022
An efficient plug-and-play acceleration framework for semi-supervised video object segmentation by exploiting the temporal redundancies in videos presented by the compressed bitstream is proposed and a residual-based correction module is introduced that can fix wrongly propagated segmentation masks from noisy or erroneous motion vectors.

TransVOS: Video Object Segmentation with Transformers

This paper proposes a new transformerbased framework, termed TransVOS, introducing a vision transformer to fully exploit and model both the temporal and spatial relationships in semi-supervised video object segmentation.

Flow-guided Semi-supervised Video Object Segmentation

An encoder-decoder approach to address the segmentation task and a model to extract the combined information from optical flow and the image is proposed, which is then used as input to the target model and the decoder network.

AccDecoder: Accelerated Decoding for Neural-enhanced Video Analytics

AccDecoder is presented, a novel accelerated decoder for real-time and neural- enhanced video analytics that can select a few frames adaptively via Deep Reinforcement Learning (DRL) to enhance the quality by neural super-resolution and then up-scale the unselected frames that reference them, which leads to 6-21% accuracy improvement.

Comparative Study of Real-Time Semantic Segmentation Networks in Aerial Images During Flooding Events

  • Farshad SafaviM. Rahnemoonfar
  • Computer Science, Environmental Science
    IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
  • 2023
This article comprehensively studies several lightweight architectures, including encoder–decoder and two-pathway architectures, evaluating their performance on aerial imagery datasets and benchmark the efficiency and accuracy of different models on the FloodNet dataset to examine the practicability of these models during emergency response for aerial image segmentation.

References

SHOWING 1-10 OF 42 REFERENCES

State-Aware Tracker for Real-Time Video Object Segmentation

This work proposes a novel pipeline called State-Aware Tracker (SAT), which can produce accurate segmentation results with real-time speed and takes advantage of the inter-frame consistency and deals with each target object as a tracklet.

Fast Video Object Segmentation by Reference-Guided Mask Propagation

A deep Siamese encoder-decoder network is proposed that is designed to take advantage of mask propagation and object detection while avoiding the weaknesses of both approaches, and achieves accuracy competitive with state-of-the-art methods while running in a fraction of time compared to others.

MaskRNN: Instance Level Video Object Segmentation

MaskRNN is developed, a recurrent neural net approach which fuses in each frame the output of two deep nets for each object instance, able to take advantage of long-term temporal structures of the video data as well as rejecting outliers.

Spatiotemporal CNN for Video Object Segmentation

A unified, end-to-end trainable spatiotemporal CNN model for VOS, which consists of two branches, i.e., the temporal coherence branch and the spatial segmentation branch, which is designed to capture the dynamic appearance and motion cues of video sequences to guide object segmentation.

Efficient Video Object Segmentation via Network Modulation

This work proposes a novel approach that uses a single forward pass to adapt the segmentation model to the appearance of a specific object and is 70× faster than fine-tuning approaches and achieves similar accuracy.

Video Object Segmentation Using Space-Time Memory Networks

This work proposes a novel solution for semi-supervised video object segmentation by leveraging memory networks and learning to read relevant information from all available sources to better handle the challenges such as appearance changes and occlussions.

PReMVOS: Proposal-generation, Refinement and Merging for Video Object Segmentation

This work addresses semi-supervised video object segmentation, the task of automatically generating accurate and consistent pixel masks for objects in a video sequence, given the first-frame ground truth annotations, with the PReMVOS algorithm.

Fast Video Object Segmentation With Temporal Aggregation Network and Dynamic Template Matching

“tracking-by-detection” is introduced into VOS which can coherently integrates segmentation into tracking, by proposing a new temporal aggregation network and a novel dynamic time-evolving template matching mechanism to achieve significantly improved performance.

Online Adaptation of Convolutional Neural Networks for Video Object Segmentation

Online Adaptive Video Object Segmentation (OnAVOS) is proposed which updates the network online using training examples selected based on the confidence of the network and the spatial configuration and adds a pretraining step based on objectness, which is learned on PASCAL.

Fast Online Object Tracking and Segmentation: A Unifying Approach

This method improves the offline training procedure of popular fully-convolutional Siamese approaches for object tracking by augmenting their loss with a binary segmentation task, and operates online, producing class-agnostic object segmentation masks and rotated bounding boxes at 55 frames per second.