Hybrid-S2S: Video Object Segmentation with Recurrent Networks and Correspondence Matching
@inproceedings{Azimi2021HybridS2SVO, title={Hybrid-S2S: Video Object Segmentation with Recurrent Networks and Correspondence Matching}, author={Fatemeh Azimi and Stanislav Frolov and Federico Raue and J{\"o}rn Hees and Andreas R. Dengel}, booktitle={VISIGRAPP}, year={2021} }
One-shot Video Object Segmentation~(VOS) is the task of pixel-wise tracking an object of interest within a video sequence, where the segmentation mask of the first frame is given at inference time. In recent years, Recurrent Neural Networks~(RNNs) have been widely used for VOS tasks, but they often suffer from limitations such as drift and error propagation. In this work, we study an RNN-based architecture and address some of these issues by proposing a hybrid sequence-to-sequence architecture…
Figures and Tables from this paper
2 Citations
Self-supervised Test-time Adaptation on Video Data
- Computer Science2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
- 2022
This paper explores whether the recent progress in test-time adaptation in the image domain and self-supervised learning can be lever-aged to adapt a model to previously unseen and unlabelled videos presenting both mild (but arbitrary) and severe covariate shifts.
Spatial Transformer Networks for Curriculum Learning
- Computer ScienceArXiv
- 2021
This work takes inspiration from Spatial Transformer Networks (STNs) in order to form an easy-to-hard curriculum, and hypothesizes that images processed by STNs can be seen as easier tasks and utilized in the interest of curriculum learning.
References
SHOWING 1-10 OF 57 REFERENCES
Revisiting Sequence-to-Sequence Video Object Segmentation with Multi-Task Loss and Skip-Memory
- Computer Science2020 25th International Conference on Pattern Recognition (ICPR)
- 2021
This work builds upon a sequence-to-sequence approach that employs an encoder-decoder architecture together with a memory module for exploiting the sequential data and proposes a model that manipulates multiscale spatio-temporal information using memory-equipped skip connections.
RVOS: End-To-End Recurrent Network for Video Object Segmentation
- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019
This work proposes a Recurrent network for multiple object Video Object Segmentation (RVOS) that is fully end-to-end trainable and achieves faster inference runtimes than previous methods, reaching 44ms/frame on a P100 GPU.
Online Adaptation of Convolutional Neural Networks for Video Object Segmentation
- Computer ScienceBMVC
- 2017
Online Adaptive Video Object Segmentation (OnAVOS) is proposed which updates the network online using training examples selected based on the confidence of the network and the spatial configuration and adds a pretraining step based on objectness, which is learned on PASCAL.
Learning Video Object Segmentation from Static Images
- Computer Science2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
It is demonstrated that highly accurate object segmentation in videos can be enabled by using a convolutional neural network (convnet) trained with static images only, and a combination of offline and online learning strategies are used.
Anchor Diffusion for Unsupervised Video Object Segmentation
- Computer Science2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2019
Inspired by the non-local operators, a technique to establish dense correspondences between pixel embeddings of a reference "anchor" frame and the current one is introduced, which allows the learning of pairwise dependencies at arbitrarily long distances without conditioning on intermediate frames.
A Transductive Approach for Video Object Segmentation
- Computer Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
This work proposes a simple yet strong transductive method, in which additional modules, datasets, and dedicated architectural designs are not needed, and takes a label propagation approach where pixel labels are passed forward based on feature similarity in an embedding space.
Efficient Video Object Segmentation via Network Modulation
- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
This work proposes a novel approach that uses a single forward pass to adapt the segmentation model to the appearance of a specific object and is 70× faster than fine-tuning approaches and achieves similar accuracy.
Video Object Segmentation Using Space-Time Memory Networks
- Computer Science2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2019
This work proposes a novel solution for semi-supervised video object segmentation by leveraging memory networks and learning to read relevant information from all available sources to better handle the challenges such as appearance changes and occlussions.
Video Object Segmentation without Temporal Information
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2019
Semantic One-Shot Video Object Segmentation is presented, based on a fully-convolutional neural network architecture that is able to successively transfer generic semantic information, learned on ImageNet, to the task of foreground segmentation, and finally to learning the appearance of a single annotated object of the test sequence (hence one shot).
One-Shot Video Object Segmentation
- Computer Science2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
One-Shot Video Object Segmentation (OSVOS), based on a fully-convolutional neural network architecture that is able to successively transfer generic semantic information, learned on ImageNet, to the task of foreground segmentation, and finally to learning the appearance of a single annotated object of the test sequence (hence one-shot).