Show Me What I Like: Detecting User-Specific Video Highlights Using Content-Based Multi-Head Attention

  title={Show Me What I Like: Detecting User-Specific Video Highlights Using Content-Based Multi-Head Attention},
  author={Uttaran Bhattacharya and Gang Wu and Stefano Petrangeli and Viswanathan Swaminathan and Dinesh Manocha},
  journal={Proceedings of the 30th ACM International Conference on Multimedia},
We propose a method to detect individualized highlights for users on given target videos based on their preferred highlight clips marked on previous videos they have watched. Our method explicitly leverages the contents of both the preferred clips and the target videos using pre-trained features for the objects and the human activities. We design a multi-head attention mechanism to adaptively weigh the preferred clips based on their object- and human-activity-based contents, and fuse them using… 



HighlightMe: Detecting Highlights from Human-Centric Videos

A domain- and user-preference-agnostic approach to detect highlightable excerpts from human-centric videos that observes a 4–12% improvement in the mean average precision of matching the human-annotated highlights over state-of-the-art methods in these datasets, without requiring any user-provided preferences or dataset-specific fine-tuning.

Three-Dimensional Attention-Based Deep Ranking Model for Video Highlight Detection

A novel three-dimensional (3-D) (spatial+temporal) attention model that can automatically localize the key elements in a video without any extra supervised annotations is proposed that achieves significant improvement over state-of-the-art methods.

Video Highlight Detection via Region-Based Deep Ranking Model

The video highlight detection task is to localize key elements (moments of user’s major or special interest) in a video to extract features from features of interest in the video.

Adaptive Video Highlight Detection by Learning from User History

A simple yet effective framework that learns to adapt highlight detection to a user by exploiting the user's history in the form of highlights that the user has previously created is proposed.

Highlight Detection with Pairwise Deep Ranking for First-Person Video Summarization

  • Ting YaoTao MeiY. Rui
  • Computer Science
    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
A novel pairwise deep ranking model that employs deep learning techniques to learn the relationship between high-light and non-highlight video segments is proposed and achieves the improvement over the state-of-the-art RankSVM method by 10.5% in terms of accuracy.

Less Is More: Learning Highlight Detection From Video Duration

This work proposes a scalable unsupervised solution that exploits video duration as an implicit supervision signal, and introduces a novel ranking framework that prefers segments from shorter videos, while properly accounting for the inherent noise in the (unlabeled) training data.

Exploiting Web Images for Video Highlight Detection With Triplet Deep Ranking

This work proposes a novel triplet deep ranking approach to video highlight detection using Web images as a weak supervision, which is fully category independent and exploits weakly supervised Web images.

Creating Summaries from User Videos

This paper proposes a novel approach and a new benchmark for video summarization, which focuses on user videos, which are raw videos containing a set of interesting events, and generates high-quality results, comparable to manual, human-created summaries.

PHD-GIFs: Personalized Highlight Detection for Automatic GIF Creation

A global ranking model which can condition on a particular user's interests is presented, which proves more precise than the user-agnostic baselines even with only one single person-specific example.

Ranking Domain-Specific Highlights by Analyzing Edited Videos

This work presents a fully automatic system for ranking domain-specific highlights in unconstrained personal videos by analyzing online edited videos and shows that impressive highlights can be retrieved without additional human supervision for domains like skating, surfing, skiing, gymnastics, parkour, and dog activity.