Learn More
In this paper, we investigate new approaches to improve speech activity detection, speaker segmentation and speaker clustering. The main idea behind them is to deal with the problem of speaker diarization for meetings where error rates are relatively high. In opposition to existing methods, a new iterative scheme is proposed considering those three tasks as(More)
Audio-Visual People Diarization (AVPD) is an original framework that simultaneously improves audio, video, and audiovisual diarization results. Following a literature review of people diarization for both audio and video content and their limitations, which includes our own contributions, we describe a proposed method for associating both audio and video(More)
Content-based people clustering is a crucial step for people indexing within video documents. In this paper, we investigate the use of both face and clothing features. A method of extracting a <i>keyface</i> for each video sequence is proposed. An algorithm based on the average of the <i>N</i>-minimum pair distances between local invariant features is used(More)
To index efficiently the soundtrack of multimedia documents, it is necessary to extract elementary and homogeneous acoustic segments. In this paper, we explore such a prior partitioning which consists in detect the two basic components, which are speech and music components. The originality of this work is that music and speech are not considered as two(More)
In this paper, we describe a new method for speaker segmentation and clustering of an audio document. For the segmentation phase, we combine the generalized likelihood ratio (GLR) and the Bayesian information criterion (BIC) in a way that avoids most of the parameters tuning. For the clustering phase, we use an existing approach that utilizes the eigen(More)
Recent TV series tend to have more and more complex plot. They follow the lives of numerous characters and are made of multiple intertwined stories. In this paper, we introduce StoViz, a web-based interface allowing a fast overview of this kind of episode structure, based on our plot de-interlacing system. StoViz has two main goals. First, it provides the(More)
Multiple sub-stories usually coexist in every episode of a TV series. We propose several variants of an approach for plot de-interlacing based on scenes clustering – with the ultimate goal of providing the end-user with tools for fast and easy overview of one episode, one season or the whole TV series. Each scene can be described in three different ways(More)
Our work deals with the classical problem of merging heterogeneous and asynchronous parameters. It's well known that lip reading improves the speech recognition score, specially in noisy conditions; so we study more precisely the modeling of acoustic and articulatory parameters to propose new Automatic Speech Recognition systems. We use a segmental(More)