• Corpus ID: 19140723

Video Summarization via Semantic Attended Networks

@inproceedings{Wei2018VideoSV,
  title={Video Summarization via Semantic Attended Networks},
  author={Huawei Wei and Bingbing Ni and Yichao Yan and Huanyu Yu and Xiaokang Yang and Chen Yao},
  booktitle={AAAI},
  year={2018}
}
The goal of video summarization is to distill a raw video into a more compact form without losing much semantic information. However, previous methods mainly consider the diversity and representation interestingness of the obtained summary, and they seldom pay sufficient attention to semantic information of resulting frame set, especially the long temporal range semantics. To explicitly address this issue, we propose a novel technique which is able to extract the most semantically relevant… 
Learning Unsupervised Video Summarization with Semantic-Consistent Network
TLDR
A novel semantic-consistent unsupervised framework termed ScSUM is proposed which is able to extract the essence of the video via obtaining the greatest semantic similarity, and requires no manual description.
Property-Constrained Dual Learning for Video Summarization
TLDR
A dual learning framework is proposed by integrating the summary generation and video reconstruction together, which targets to reward the summary generator under the assistance of the video reconstructor, and two property models are developed to measure the representativeness and diversity of the generated summary.
User-Ranking Video Summarization With Multi-Stage Spatio–Temporal Representation
TLDR
This paper presents a novel supervised video summarization scheme based on three-stage deep neural networks, and proposes a simple but effective user-ranking method to cope with the labeling subjectivity problem of user-created video summarizations, leading to the labeling quality refinement for robust supervised learning.
Discriminative Feature Learning for Unsupervised Video Summarization
TLDR
This paper addresses the problem of unsupervised video summarization that automatically extracts key-shots from an input video and designs a novel two-stream network named Chunk and Stride Network (CSNet) that utilizes local (chunk) and global (stride) temporal view on the video features.
Multi-modal Summarization for Video-containing Documents
TLDR
This work proposes a novel multi-modal summarization task to summarize from a document and its associated video, and builds a baseline general model with effective strategies, i.e., bi-hop attention and improved late fusion mechanisms to bridge the gap between different modalities.
ERA: ENTITY–RELATIONSHIP AWARE VIDEO SUMMARIZATION
Video summarization aims to simplify large-scale video browsing by generating concise, short summaries that diver from but well represent the original video. Due to the scarcity of video annotations,
Bi-Directional Self-Attention with Relative Positional Encoding for Video Summarization
  • Jingxu Lin, S. Zhong
  • Computer Science
    2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI)
  • 2020
TLDR
This paper proposes a novel deep summarization framework named Bi-Directional Self-Attention with Relative Positional Encoding for Video Summarization (BiDAVS) that can be highly parallelized and effectively capture long-range temporal dependencies of sequential frames by computing bi-directional attention.
Video Summarization with a Dual Attention Capsule Network
In this paper, we address the problem of video summarization, which aims at selecting a subset of video frames as a summary to represent the original video contents compactly and completely. We
Comprehensive Video Understanding: Video Summarization with Content-Based Video Recommender Design
TLDR
A scalable deep neural network is proposed on predicting if one video segment is a useful segment for users by explicitly modelling both segment and video and extends the work by data augmentation and multi-task learning for preventing the model from early-stage overfitting.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 32 REFERENCES
Enhancing Video Summarization via Vision-Language Embedding
TLDR
This paper addresses video summarization, or the problem of distilling a raw video into a shorter form while still capturing the original story, by extending a recent submodular summarization approach with representativeness and interestingness objectives computed on features from a joint vision-language embedding space.
Summary Transfer: Exemplar-Based Subset Selection for Video Summarization
TLDR
A novel subset selection technique that leverages supervision in the form of humancreated summaries to perform automatic keyframe-based video summarization, and shows how to extend the method to exploit semantic side information about the video's category/ genre to guide the transfer process by those training videos semantically consistent with the test input.
Textually Customized Video Summaries
TLDR
This paper trains a deep architecture to effectively learn semantic embeddings of video frames by leveraging the abundance of image-caption data via a progressive and residual manner and shows that the method is able to generate semantically diverse video summaries by only utilizing the learned visual embedded data.
Video Summarization With Attention-Based Encoder–Decoder Networks
TLDR
This paper proposes a novel video summarization framework named attentive encoder–decoder networks forVideo summarization (AVS), in which the encoder uses a bidirectional long short-term memory (BiLSTM) to encode the contextual information among the input video frames.
TVSum: Summarizing web videos using titles
TLDR
A novel co-archetypal analysis technique is developed that learns canonical visual concepts shared between video and images, but not in either alone, by finding a joint-factorial representation of two data sets.
Video Summarization with Long Short-Term Memory
TLDR
Long Short-Term Memory (LSTM), a special type of recurrent neural networks are used to model the variable-range dependencies entailed in the task of video summarization to improve summarization by reducing the discrepancies in statistical properties across those datasets.
Bidirectional Long-Short Term Memory for Video Description
TLDR
A novel video captioning framework, termed as BiLSTM, which deeply captures bidirectional global temporal structure in video, and which is comprehensively preserving sequential and visual information and adaptively learning dense visual features and sparse semantic representations for videos and sentences.
Query-Focused Extractive Video Summarization
TLDR
A probabilistic model, Sequential and Hierarchical Determinantal Point Process (SH-DPP), is developed, for query-focused extractive video summarization, which returns a summary by selecting key shots from the video.
Large-Scale Video Summarization Using Web-Image Priors
TLDR
This work applies novel insight to develop a summarization algorithm that uses the web-image based prior information in an unsupervised manner and proposes a framework that relies on multiple summaries obtained through crowdsourcing to automatically evaluate summarization algorithms on a large scale.
Video summarization from spatio-temporal features
TLDR
This paper presents a video summarization method based on the study of spatio-temporal activity within the video which was tested on the BBC Rushes Summarization task within the TRECVID 2008 campaign.
...
1
2
3
4
...