Learn More
The Placing Task is a yearly challenge offered by the MediaEval Multimedia Benchmarking Initiative that requires participants to develop algorithms that automatically predict the geo-location of social media videos and images. We introduce a recent development of a new standardized web-scale geo-tagged dataset for Placing Task 2014, which contains 5.5(More)
—The speaker diarization system developed at the International Computer Science Institute (ICSI) has played a prominent role in the speaker diarization community, and many researchers in the Rich Transcription community have adopted methods and techniques developed for the ICSI speaker diarization engine. Although there have been many related publications(More)
This paper summarizes our contribution to the Yahoo! task of the ACM Multimedia Grand Challenge. This challenge asks for the robust automatic segmentation of videos according to "narrative themes". Based on the automatic segmentation methods presented in [1] and partly [2], we describe a system to navigate Seinfeld episodes based on automatic segmentation(More)
Over the recent years, the problem of video location estimation (i.e., estimating the longitude/latitude coordinates of a video without GPS information) has been approached with diverse methods and ideas in the research community and significant improvements have been made. So far, however, systems have only been compared against each other and no(More)
AnalysiS in a High‐level language (grant IIS‐1251276). Any opinions, findings, and conclusions or recommendations are those of the authors and do not necessarily reflect the views of Cisco, LLNL, nor the NSF. Abstract The YLI Multimedia Event Detection corpus is a public-domain index of videos with annotations and computed features, specialized for research(More)
Based on contrastive experiments the following article presents a discussion on the difficulties of using state-of-the-art speaker recognition methods under realistic car noise conditions and argues that, even though work has been done in this area, current approaches fail to address the main problems occurring in this task. The article also proposes a(More)
Joke-o-mat HD is a system that allows a user to navigate sitcoms (such as <i>Seinfeld</i>) by "narrative themes", including scenes, punchlines, and dialog segments. The themes can be filtered by the main actors and by keyword. For example, the user can select to see only punchlines by Kramer that contain the word "armoire". The system infers the narrative(More)
This article describes a system to navigate Seinfeld episodes based on acoustic event detection and speaker identification of the audio track and subsequent inference of narrative themes based on genre-specific production rules. The system distinguishes laughter, music, and other noise as well as speech segments. Speech segments are then identified against(More)
—Recently, audio concepts emerged as a useful building block in multimodal video retrieval systems. Information like " this file contains laughter " , " this file contains engine sounds " or " this file contains slow music " can significantly improve purely visual based retrieval. The weak point of current approaches to audio concept detection is that they(More)
In this article we review the methods we have developed for finding Mechanical Turk participants for the manual annotation of the geo-location of random videos from the web. We require high quality annotations for this project, as we are attempting to establish a human baseline for future comparison to machine systems. This task is different from a standard(More)