Weakly Supervised Video Salient Object Detection

@article{Zhao2021WeaklySV,
  title={Weakly Supervised Video Salient Object Detection},
  author={Wangbo Zhao and Jing Zhang and Long Li and Nick Barnes and Nian Liu and Junwei Han},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021},
  pages={16821-16830}
}
  • Wangbo ZhaoJing Zhang Junwei Han
  • Published 6 April 2021
  • Computer Science
  • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Significant performance improvement has been achieved for fully-supervised video salient object detection with the pixel-wise labeled training datasets, which are time-consuming and expensive to obtain. To relieve the burden of data annotation, we present the first weakly super-vised video salient object detection model based on relabeled “fixation guided scribble annotations”. Specifically, an "Appearance-motion fusion module" and bidirectional ConvLSTM based framework are proposed to achieve… 

Figures and Tables from this paper

Weakly Supervised Video Salient Object Detection via Point Supervision

A hybrid token attention module is proposed, which mixes optical flow and image information from orthogonal directions, adaptively highlighting critical optical flow information (channel dimension) and critical token information (spatial dimension) to infer saliency maps with temporal information.

Learning Video Salient Object Detection Progressively from Unlabeled Videos

A novel VSOD method via a progressive framework that locates and segments salient objects in sequence without utilizing any video annotation is proposed, which is competitive with fully supervised methods and outperforms the state-of-the-art weakly and unsupervised methods.

Scribble-based Boundary-aware Network for Weakly Supervised Salient Object Detection in Remote Sensing Images

Spatiotemporal context-aware network for video salient object detection

The SCANet is proposed, which develops the pyramid dilated 3D convolutional (PD3C) module to generate rich temporal features by leveraging context information and a feature aggregation module is designed to effectively integrate spatial and temporal features.

Weakly-Supervised Audiovisual Network For Video Saliency Estimation

An audio-visual network using incomplete visual fixation labels to deal with the video saliency estimation and appropriately learn to localize sound sources in weakly-supervised settings is proposed.

A Weakly Supervised Learning Framework for Salient Object Detection via Hybrid Labels

A new weakly-supervised SOD task under hybrid labels, where the supervision labels include a large number of coarse labels generated by the traditional unsupervised method and a small number of real labels, and a new pipeline framework with three sophisticated training strategies is designed.

Motion-aware Memory Network for Fast Video Salient Object Detection

A space-time memory (STM)-based network, which extracts useful temporal information of the current frame from adjacent frames as the temporal branch of VSOD and inspired by the boundary supervision commonly used in image salient object detection (ISOD), and proposes an effective fusion strategy for spatial and temporal branches.

A Novel Long-Term Iterative Mining Scheme for Video Salient Object Detection

A novel VSOD approach that converts the sequential VSOD, a sequential task, to a data mining problem, i.e., decomposing the input video sequence to object proposals in advance and then mining salient object proposals as much as possible in an easy-to-hard way is proposed.

PSNet: Parallel Symmetric Network for Video Salient Object Detection

A VSOD network with up and down parallel symmetry, named PSNet, is proposed and the Importance Perception Fusion (IPF) module is used to fuse the features from two parallel branches according to their different importance in different scenarios.

Appearance-guided Attentive Self-Paced Learning for Unsupervised Salient Object Detection

This work proposes a novel appearance-guided attentive self-paced learning framework for unsupervised salient object detection that achieves state-of-the-art performance against existing USOD methods and is comparable to the latest supervised SOD methods.

References

SHOWING 1-10 OF 47 REFERENCES

Semi-Supervised Video Salient Object Detection Using Pseudo-Labels

This paper presents an effective video saliency detector that consists of a spatial refinement network and a spatiotemporal module and proposes a novel method for generating pixel-level pseudo-labels from sparsely annotated frames.

Weakly Supervised Salient Object Detection With Spatiotemporal Cascade Neural Networks

A novel weakly supervised approach to the salient object detection in a video, which can learn a robust saliency prediction model by using very limited manually labeled data and a large amount of weakly labeled data that could be easily generated in a supervised approach.

Learning Video Object Segmentation From Unlabeled Videos

A unified unsupervised/weakly supervised learning framework, called MuG, that comprehensively captures intrinsic properties of VOS at multiple granularities is introduced that can help advance understanding of visual patterns in VOS and significantly reduce annotation burden.

Flow Guided Recurrent Neural Encoder for Video Salient Object Detection

Flow guided recurrent neural encoder (FGRNE) is presented, an accurate and end-to-end learning framework for video salient object detection that significantly outperforms state-of-the-art methods on the public benchmarks of DAVIS and FBMS.

Learning to Detect Salient Objects with Image-Level Supervision

This paper develops a weakly supervised learning method for saliency detection using image-level tags only, which outperforms unsupervised ones with a large margin, and achieves comparable or even superior performance than fully supervised counterparts.

Deeply Supervised Salient Object Detection with Short Connections

A new saliency method is proposed by introducing short connections to the skip-layer structures within the HED architecture, which produces state-of-the-art results on 5 widely tested salient object detection benchmarks, with advantages in terms of efficiency, effectiveness, and simplicity over the existing algorithms.

Weakly-Supervised Salient Object Detection via Scribble Annotations

This paper proposes a weakly-supervised salient object detection model to learn saliency from scribble annotations, and presents a new metric, termed saliency structure measure, as a complementary metric to evaluate sharpness of the prediction.

Video Salient Object Detection via Fully Convolutional Networks

A novel data augmentation technique that simulates video training data from existing annotated image data sets, which enables the deep video saliency network to learn diverse saliency information and prevents overfitting with the limited number of training videos.

Shifting More Attention to Video Salient Object Detection

A visual-attention-consistent Densely Annotated VSOD (DAVSOD) dataset, which contains 226 videos with 23,938 frames that cover diverse realistic-scenes, objects, instances and motions, and a baseline model equipped with a saliency shift- aware convLSTM, which can efficiently capture video saliency dynamics through learning human attention-shift behavior is proposed.

A Benchmark Dataset and Saliency-Guided Stacked Autoencoders for Video-Based Salient Object Detection.

The proposed unsupervised baseline approach for video-based SOD is compared with 31 state-of-the-art models on the proposed dataset and outperforms 30 of them, including 19 image-based classic (unsupervised or non-deep learning) models, six image- based deep learning models, and five video- based unsuper supervised models.