Luke R. Gottlieb

Learn More
The speaker diarization system developed at the International Computer Science Institute (ICSI) has played a prominent role in the speaker diarization community, and many researchers in the rich transcription community have adopted methods and techniques developed for the ICSI speaker diarization engine. Although there have been many related publications(More)
This paper summarizes our contribution to the Yahoo! task of the ACM Multimedia Grand Challenge. This challenge asks for the robust automatic segmentation of videos according to "narrative themes". Based on the automatic segmentation methods presented in [1] and partly [2], we describe a system to navigate Seinfeld episodes based on automatic segmentation(More)
The Placing Task is a yearly challenge offered by the MediaEval Multimedia Benchmarking Initiative that requires participants to develop algorithms that automatically predict the geo-location of social media videos and images. We introduce a recent development of a new standardized web-scale geo-tagged dataset for Placing Task 2014, which contains 5.5(More)
Given the exponential growth of videos published on the Internet, mechanisms for clustering, searching, and browsing large numbers of videos have become a major research area. More importantly, there is a demand for event detectors that go beyond the simple finding of objects but rather detect more abstract concepts, such as "feeding an animal" or a(More)
The YLI Multimedia Event Detection corpus is a public-domain index of videos with annotations and computed features, specialized for research in multimedia event detection (MED), i.e., automatically identifying what's happening in a video by analyzing the audio and visual content. The videos indexed in the YLI-MED corpus are a subset of the larger YLI(More)
Joke-o-mat HD is a system that allows a user to navigate sitcoms (such as <i>Seinfeld</i>) by "narrative themes", including scenes, punchlines, and dialog segments. The themes can be filtered by the main actors and by keyword. For example, the user can select to see only punchlines by Kramer that contain the word "armoire". The system infers the narrative(More)
In this article we review the methods we have developed for finding Mechanical Turk participants for the manual annotation of the geo-location of random videos from the web. We require high quality annotations for this project, as we are attempting to establish a human baseline for future comparison to machine systems. This task is different from a standard(More)
Over the recent years, the problem of video location estimation (i.e., estimating the longitude/latitude coordinates of a video without GPS information) has been approached with diverse methods and ideas in the research community and significant improvements have been made. So far, however, systems have only been compared against each other and no(More)
This article describes a system to navigate Seinfeld episodes based on acoustic event detection and speaker identification of the audio track and subsequent inference of narrative themes based on genre-specific production rules. The system distinguishes laughter, music, and other noise as well as speech segments. Speech segments are then identified against(More)
Recently, audio concepts emerged as a useful building block in multimodal video retrieval systems. Information like ”this file contains laughter”, ”this file contains engine sounds” or ”this file contains slow music” can significantly improve purely visual based retrieval. The weak point of current approaches to audio concept detection is that they heavily(More)