One-stage Video Instance Segmentation: From Frame-in Frame-out to Clip-in Clip-out
@article{Li2022OnestageVI, title={One-stage Video Instance Segmentation: From Frame-in Frame-out to Clip-in Clip-out}, author={Minghan Li and Lei Zhang}, journal={ArXiv}, year={2022}, volume={abs/2203.06421} }
Many video instance segmentation (VIS) methods partition a video sequence into individual frames to detect and segment objects frame by frame. However, such a frame-in frame-out (FiFo) pipeline is ineffective to exploit the temporal information. Based on the fact that adjacent frames in a short clip are highly coherent in content, we propose to extend the one-stage FiFo framework to a clip-in clip-out (CiCo) one, which performs VIS clip by clip. Specifically, we stack FPN features of all frames…
Figures and Tables from this paper
One Citation
MDQE: Mining Discriminative Query Embeddings to Segment Occluded Instances on Challenging Videos
- Computer Science
- 2023
This work proposes to mine discriminative query embeddings (MDQE) to segment occluded instances on challenging videos and proposes an inter-instance mask repulsion loss to distance each instance from its nearby non-target instances.
References
SHOWING 1-10 OF 30 REFERENCES
Classifying, Segmenting, and Tracking Object Instances in Video with Mask Propagation
- Computer Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
The method, named MaskProp, adapts the popular Mask R-CNN to video by adding a mask propagation branch that propagates frame-level object instance masks from each video frame to all the other frames in a video clip to predict clip-level instance tracks with respect to the object instances segmented in the middle frame of the clip.
STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos
- Computer ScienceECCV
- 2020
A novel approach that segments and tracks instances across space and time in a single stage and is trained end-to-end to learn spatio-temporal embeddings as well as parameters required to cluster pixels belonging to a specific objectinstance over an entire video clip is proposed.
Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation
- Computer Science2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2021
This work proposes a simple yet effective one-stage video instance segmentation framework by spatial calibration and temporal fusion, namely STMask, which helps the frame-work to handle challenging videos such as motion blur, partial occlusion and unusual object-to-camera poses.
Video Instance Segmentation using Inter-Frame Communication Transformers
- Computer ScienceNeurIPS
- 2021
This work proposes Inter-frame Communication Transformers (IFC), which reduces the overhead for information-passing between frames by efficiently encoding the context within the input clip by utilizing concise memory tokens as a means of conveying information as well as summarizing each frame scene.
Video Instance Segmentation with a Propose-Reduce Paradigm
- Computer Science2021 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2021
This work proposes a new paradigm – Propose-Reduce, to generate complete sequences for input videos by a single step, and builds a sequence propagation head on the existing image-level instance segmentation network for long-term propagation.
SG-Net: Spatial Granularity Network for One-Stage Video Instance Segmentation
- Computer Science2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2021
This work proposes a one-stage spatial granularity network (SG-Net) and presents state-of-the-art comparisons on the YouTube-VIS dataset, hoping it could serve as a strong and flexible base-line for the VIS task.
CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation
- Computer ScienceAAAI
- 2021
This work proposes a novel comprehensive feature aggregation approach (CompFeat) to refine features at both frame-level and object-level with temporal and spatial context information to eliminate ambiguities introduced by only using single-frame features.
Crossover Learning for Fast Online Video Instance Segmentation
- Computer Science2021 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2021
A novel crossover learning scheme that uses the instance feature in the current frame to pixel-wisely localize the same instance in other frames to enable efficient cross-frame instance-to-pixel relation learning and brings cost-free improvement during inference.
Video Instance Segmentation
- Computer Science2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2019
The first time that the image instance segmentation problem is extended to the video domain, and a novel algorithm called MaskTrack R-CNN is proposed for this task, which is simultaneous detection, segmentation and tracking of instances in videos.
SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation
- Computer ScienceECCV
- 2020
A fast single-stage instance segmentation method that preserves instance-specific spatial information by separating mask prediction of an instance to different sub-regions of a detected bounding-box, leading to improved mask predictions and a mask alignment weighting loss and a feature alignment scheme to better correlate mask prediction with object detection.