• Corpus ID: 48353451

Fast forwarding Egocentric Videos by Listening and Watching

  title={Fast forwarding Egocentric Videos by Listening and Watching},
  author={Vinicius Signori Furlan and Ruzena Bajcsy and Erickson R. Nascimento},
The remarkable technological advance in well-equipped wearable devices is pushing an increasing production of long first-person videos. However, since most of these videos have long and tedious parts, they are forgotten or never seen. Despite a large number of techniques proposed to fast-forward these videos by highlighting relevant moments, most of them are image based only. Most of these techniques disregard other relevant sensors present in the current devices such as high-definition… 

Figures and Tables from this paper

Musical Hyperlapse: A Multimodal Approach to Accelerate First-Person Videos

A new methodology is presented that creates accelerated videos and includes the background music keeping the same emotion induced by visual and acoustic modalities and achieves the best performance in matching emotion similarity while also maintaining the visual quality of the output video.

Robust Motion Compensation for Forensic Analysis of Egocentric Video using Joint Stabilization and Tracking

The work presented in this paper describes robust methods for video frame stabilization and in-frame object stabilization and tracking for egocentric video analysis, specifically adapted for forensic investigation.

SpeedNet: Learning the Speediness in Videos

This work applies SpeedNet, a novel deep network trained to detect if a video is playing at normal rate, or if it is sped up, to generating time-varying, adaptive video speedups, which can allow viewers to watch videos faster, but with less of the jittery, unnatural motions typical to videos that are sped up uniformly.

Towards Semantic Fast-Forward and Stabilized Egocentric Videos

This work presents a methodology capable of summarizing and stabilizing egocentric videos by extracting the semantic information from the frames by using a new smoothness evaluation metric for egOCentric videos.

Fast-forward video based on semantic extraction

A novel methodology to compose the new fast-forward video by selecting frames based on semantic information extracted from images is proposed and it is shown that it outperforms the state-of-the-art as far as semantic information is concerned and that it is also able to produce videos that are more pleasant to be watched.

Making a long story short: A multi-importance fast-forwarding egocentric videos with the emphasis on relevant objects

A Weighted Sparse Sampling and Smoothing Frame Transition Approach for Semantic Fast-Forward First-Person Videos

A new adaptive frame selection formulated as a weighted minimum reconstruction problem is presented, which combined with a smoothing frame transition method accelerates first-person videos emphasizing the relevant segments and avoids visual discontinuities.

Highlight Detection with Pairwise Deep Ranking for First-Person Video Summarization

  • Ting YaoTao MeiY. Rui
  • Computer Science
    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
A novel pairwise deep ranking model that employs deep learning techniques to learn the relationship between high-light and non-highlight video segments is proposed and achieves the improvement over the state-of-the-art RankSVM method by 10.5% in terms of accuracy.

Semantic-driven Generation of Hyperlapse from 360° Video

A system for converting a fully panoramic video into a normal field-of-view (NFOV) hyperlapse for an optimal viewing experience that exploits visual saliency and semantics to non-uniformly sample in space and time for generating hyperlapses.

Semantic-Driven Generation of Hyperlapse from 360 Degree Video

A system for converting a fully panoramic video into a normal field-of-view (NFOV) hyperlapse for an optimal viewing experience that exploits visual saliency and semantics to non-uniformly sample in space and time for generating hyperlapses.

Audio-Visual Scene Analysis with Self-Supervised Multisensory Features

It is argued that the visual and audio components of a video signal should be modeled jointly using a fused multisensory representation, and it is proposed to learn such a representation in a self-supervised way, by training a neural network to predict whether video frames and audio are temporally aligned.

Look, Listen and Learn

There is a valuable, but so far untapped, source of information contained in the video itself – the correspondence between the visual and the audio streams, and a novel “Audio-Visual Correspondence” learning task that makes use of this.

Story-Driven Summarization for Egocentric Video

  • Zheng LuK. Grauman
  • Computer Science
    2013 IEEE Conference on Computer Vision and Pattern Recognition
  • 2013
A video summarization approach that discovers the story of an egocentric video, and defines a random-walk based metric of influence between sub shots that reflects how visual objects contribute to the progression of events.