Corpus ID: 238227080

ASOD60K: An Audio-Induced Salient Object Detection Dataset for Panoramic Videos

  title={ASOD60K: An Audio-Induced Salient Object Detection Dataset for Panoramic Videos},
  author={Yi Zhang and Fang-Yi Chao and Ge-Peng Ji and Deng-Ping Fan and Lu Zhang and Ling Shao},
  • Yi Zhang, Fang-Yi Chao, +3 authors Ling Shao
  • Published 2021
  • Computer Science
Exploring to what humans pay attention in dynamic panoramic scenes is useful for many fundamental applications, including augmented reality (AR) in retail, AR-powered recruitment, and visual language navigation. With this goal in mind, we propose PV-SOD, a new task that aims to segment salient objects from panoramic videos. In contrast to existing fixation-/object-level saliency detection tasks, we focus on audio-induced salient object detection (SOD), where the salient objects are labeled with… Expand


Shifting More Attention to Video Salient Object Detection
A visual-attention-consistent Densely Annotated VSOD (DAVSOD) dataset, which contains 226 videos with 23,938 frames that cover diverse realistic-scenes, objects, instances and motions, and a baseline model equipped with a saliency shift- aware convLSTM, which can efficiently capture video saliency dynamics through learning human attention-shift behavior is proposed. Expand
A Benchmark Dataset and Saliency-Guided Stacked Autoencoders for Video-Based Salient Object Detection
The proposed unsupervised baseline approach for video-based SOD is compared with 31 state-of-the-art models on the proposed dataset and outperforms 30 of them, including 19 image-based classic (unsupervised or non-deep learning) models, six image- based deep learning models, and five video- based unsuper supervised models. Expand
From Semantic Categories to Fixations: A Novel Weakly-Supervised Visual-Auditory Saliency Detection Approach
A novel approach in a weaklysupervised manner to alleviating the demand of large-scale training sets for visual-audio model training by using the video category tags only and proposing the selective class activation mapping (SCAM), which follows a coarse-to-fine strategy to select the most discriminative regions in the spatial-temporal-audio circumstance. Expand
Revisiting Video Saliency: A Large-Scale Benchmark and a New Model
This work introduces a new benchmark for predicting human eye movements during dynamic scene free-viewing, and proposes a novel video saliency model that augments the CNN-LSTM network architecture with an attention mechanism to enable fast, end-to-end saliency learning. Expand
Pyramid Constrained Self-Attention Network for Fast Video Salient Object Detection
A Constrained Self-Attention (CSA) operation to capture motion cues, based on the prior that objects always move in a continuous trajectory is designed, which outperforms previous state-of-the-art methods in both accuracy and speed. Expand
Looking for the Detail and Context Devils: High-Resolution Salient Object Detection
This work proposes the first end-to-end learnable framework, named Dual ReFinement Network (DRFNet), for fully automatic HRSOD, which can enlarge receptive fields and obtain more discriminative features from high-resolution images and generalizes well on typical low-resolution benchmarks. Expand
Motion Guided Attention for Video Salient Object Detection
A multi-task motion guided video salient object detection network, which learns to accomplish two sub-tasks using twoSub-networks, one sub-network for salient object Detection in still images and the other for motion saliency detection in optical flow images, which significantly outperforms existing state-of-the-art algorithms on a wide range of benchmarks. Expand
Semi-Supervised Video Salient Object Detection Using Pseudo-Labels
This paper presents an effective video saliency detector that consists of a spatial refinement network and a spatiotemporal module and proposes a novel method for generating pixel-level pseudo-labels from sparsely annotated frames. Expand
Salient Object Detection Driven by Fixation Prediction
A novel neural network called Attentive Saliency Network (ASNet) is built that learns to detect salient objects from fixation maps that offers an efficient recurrent mechanism for sequential refinement of the segmentation map. Expand
Learning to Detect Salient Objects with Image-Level Supervision
This paper develops a weakly supervised learning method for saliency detection using image-level tags only, which outperforms unsupervised ones with a large margin, and achieves comparable or even superior performance than fully supervised counterparts. Expand