CapsuleVOS: Semi-Supervised Video Object Segmentation Using Capsule Routing

@article{Duarte2019CapsuleVOSSV,
  title={CapsuleVOS: Semi-Supervised Video Object Segmentation Using Capsule Routing},
  author={Kevin Duarte and Yogesh Singh Rawat and Mubarak Shah},
  journal={2019 IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2019},
  pages={8479-8488}
}
  • Kevin Duarte, Y. Rawat, M. Shah
  • Published 30 September 2019
  • Computer Science, Engineering
  • 2019 IEEE/CVF International Conference on Computer Vision (ICCV)
In this work we propose a capsule-based approach for semi-supervised video object segmentation. Current video object segmentation methods are frame-based and often require optical flow to capture temporal consistency across frames which can be difficult to compute. To this end, we propose a video based capsule network, CapsuleVOS, which can segment several frames at once conditioned on a reference frame and segmentation mask. This conditioning is performed through a novel routing algorithm for… 
DIPNet: Dynamic Identity Propagation Network for Video Object Segmentation
TLDR
A Dynamic Identity Propagation Network (DIPNet) that adaptively propagates and accurately segments the video objects over time and provides state-of-the-art performance with time efficiency is proposed.
Integrating Long-Short Term Network for Efficient Video Object Segmentation
TLDR
This work develops an efficient and fully end-to-end model to achieve fast and accurate VOS, named Long-Short Term Network (LSTNet), which contains a long term network to encode absolute object variations and a shortterm network to capture relative object dynamics.
Dual Temporal Memory Network for Efficient Video Object Segmentation
TLDR
An end-to-end network which stores short- and long-term video sequence information preceding the current frame as the temporal memories to address the temporal modeling in VOS is presented.
Kernelized Memory Network for Video Object Segmentation
TLDR
A kernelized memory network (KMN) is proposed that surpasses the state-of-the-art on standard benchmarks by a significant margin and uses the Hide-and-Seek strategy in pre-training to obtain the best possible results in handling occlusions and segment boundary extraction.
Deep Transport Network for Unsupervised Video Object Segmentation
The popular unsupervised video object segmentation methods fuse the RGB frame and optical flow via a twostream network. However, they cannot handle the distracting noises in each input modality,
DMVOS: Discriminative Matching for Real-time Video Object Segmentation
TLDR
This work proposes Discriminative Matching for real-time Video Object Segmentation (DMVOS), a real- time VOS framework with high-accuracy to fill this gap in segmentation accuracy.
PMVOS: Pixel-Level Matching-Based Video Object Segmentation
TLDR
This work proposes a novel method-PM-based video object segmentation (PMVOS)-that constructs strong template features containing the information of all past frames and applies self-attention to the similarity maps generated from PM to capture global dependencies.
Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation
TLDR
A simple yet effective approach to modeling space-time correspondences in the context of video object segmentation that achieves new state-of-the-art results on both DAVIS and YouTubeVOS datasets while running significantly faster at 20+ FPS for multiple objects without bells and whistles.
Memory Selection Network for Video Propagation
TLDR
A memory selection network, which learns to select suitable guidance from all previous frames for effective and robust propagation is proposed, which consistently improves performance and can robustly handle challenging scenarios in video propagation.
Pixel-Level Bijective Matching for Video Object Segmentation
TLDR
A bijective matching mechanism to find the best matches from the query frame to the reference frame and vice versa is introduced and a mask embedding module to improve the existing mask propagation method is proposed.
...
1
2
3
...

References

SHOWING 1-10 OF 40 REFERENCES
RVOS: End-To-End Recurrent Network for Video Object Segmentation
TLDR
This work proposes a Recurrent network for multiple object Video Object Segmentation (RVOS) that is fully end-to-end trainable and achieves faster inference runtimes than previous methods, reaching 44ms/frame on a P100 GPU.
VideoMatch: Matching based Video Object Segmentation
TLDR
This work develops a novel matching based algorithm for video object segmentation that learns to match extracted features to a provided template without memorizing the appearance of the objects.
Fast and Accurate Online Video Object Segmentation via Tracking Parts
TLDR
This paper proposes a fast and accurate video object segmentation algorithm that can immediately start the segmentation process once receiving the images, and performs favorably against state-of-the-art algorithms in accuracy on the DAVIS benchmark dataset, while achieving much faster runtime performance.
Motion-Guided Cascaded Refinement Network for Video Object Segmentation
TLDR
This work proposes a motion-guided cascaded refinement network for video object segmentation, and introduces a single-channel residual attention module in CRN to incorporate the coarse segmentation map as attention, which makes the network effective and efficient in both training and testing.
SegFlow: Joint Learning for Video Object Segmentation and Optical Flow
TLDR
This paper proposes an end-to-end trainable network, SegFlow, for simultaneously predicting pixel-wise object segmentation and optical flow in videos, and demonstrates that introducing optical flow improves the performance of segmentation, against the state-of-the-art algorithms.
Efficient Video Object Segmentation via Network Modulation
TLDR
This work proposes a novel approach that uses a single forward pass to adapt the segmentation model to the appearance of a specific object and is 70× faster than fine-tuning approaches and achieves similar accuracy.
Video Object Segmentation with Joint Re-identification and Attention-Aware Mask Propagation
TLDR
A deep recurrent network that is capable of segmenting and tracking objects in video simultaneously by their temporal continuity, yet able to re-identify them when they re-appear after a prolonged occlusion is formulated.
Video Object Segmentation by Learning Location-Sensitive Embeddings
TLDR
An end-to-end training network is proposed which accomplishes foreground predictions by leveraging the location-sensitive embeddings which are capable to distinguish the pixels of similar objects.
Fast Video Object Segmentation by Reference-Guided Mask Propagation
TLDR
A deep Siamese encoder-decoder network is proposed that is designed to take advantage of mask propagation and object detection while avoiding the weaknesses of both approaches, and achieves accuracy competitive with state-of-the-art methods while running in a fraction of time compared to others.
MoNet: Deep Motion Exploitation for Video Object Segmentation
TLDR
A novel MoNet model to deeply exploit motion cues for boosting video object segmentation performance from two aspects, i.e., frame representation learning and segmentation refinement, provides new state-of-the-art performance on three competitive benchmark datasets.
...
1
2
3
4
...