Rethinking Video Salient Object Ranking

  title={Rethinking Video Salient Object Ranking},
  author={Jiaying Lin and Huankang Guan and Rynson W. H. Lau},
Salient Object Ranking (SOR) involves ranking the degree of saliency of multiple salient objects in an input image. Most recently, a method is proposed for ranking salient objects in an input video based on a predicted fixation map. It relies solely on the density of the fixations within the salient objects to infer their saliency ranks, which is incompatible with human perception of saliency ranking. In this work, we propose to explicitly learn the spatial and temporal relations between different… 

Figures and Tables from this paper


Ranking Video Salient Object Detection
This paper proposes a completely new definition for the salient objects in videos---ranking salient objects, which considers relative saliency ranking assisted with eye fixation points and builds a ranking video salient object dataset (RVSOD).
Salient Object Ranking with Position-Preserved Attention
This paper proposes the first end-to-end framework of the SOR task and solves it in a multi-task learning fashion, and outperforms the former state-of-the-art method significantly.
Video Saliency Detection Using Object Proposals
A novel approach to identify salient object regions in videos via object proposals by ranking and selecting the salient proposals based on object-level saliency cues, which produces significant improvements over state-of-the-art algorithms.
Instance-Level Relative Saliency Ranking with Graph Reasoning
A novel unified model is presented as the first end-to-end solution, where an improved Mask R-CNN is first used to segment salient instances and a saliency ranking branch is then added to infer the relative saliency.
Inferring Attention Shift Ranks of Objects for Image Saliency
This paper proposes a learning-based CNN to leverage both bottom-up and top-down attention mechanisms to predict the saliency rank, and achieves state-of-the-art performances on salient object rank prediction.
Revisiting Video Saliency Prediction in the Deep Learning Era
A new benchmark, called DHF1K (Dynamic Human Fixation 1K), is introduced, for predicting fixations during dynamic scene free-viewing, and a novel video saliency model is proposed, called ACLNet (Attentive CNN-LSTM Network), that augments the CNN- LSTM architecture with a supervised attention mechanism to enable fast end-to-end saliency learning.
Revisiting Salient Object Detection: Simultaneous Detection, Ranking, and Subitizing of Multiple Salient Objects
A novel deep learning solution is proposed based on a hierarchical representation of relative saliency and stage-wise refinement, and it is shown that the problem of salient object subitizing can be addressed with the same network.
Motion Guided Attention for Video Salient Object Detection
A multi-task motion guided video salient object detection network, which learns to accomplish two sub-tasks using twoSub-networks, one sub-network for salient object Detection in still images and the other for motion saliency detection in optical flow images, which significantly outperforms existing state-of-the-art algorithms on a wide range of benchmarks.
Video Salient Object Detection via Fully Convolutional Networks
A novel data augmentation technique that simulates video training data from existing annotated image data sets, which enables the deep video saliency network to learn diverse saliency information and prevents overfitting with the limited number of training videos.
Video Saliency Detection via Spatial-Temporal Fusion and Low-Rank Coherency Diffusion
A novel video saliency detection method based on the spatial-temporal saliency fusion and low-rank coherency guided saliency diffusion to guarantee the temporal smoothness of saliency maps and boost the accuracy of the computed saliency map.