Learn More
This paper evaluates the performance of the twelve primary systems submitted to the evaluation on speaker verification in the context of a mobile environment using the MOBIO database. The mobile environment provides a challenging and realistic test-bed for current state-of-the-art speaker verification techniques. Results in terms of equal error rate (EER),(More)
In this paper, we investigate new approaches to improve speech activity detection, speaker segmentation and speaker clustering. The main idea behind them is to deal with the problem of speaker diarization for meetings where error rates are relatively high. In opposition to existing methods, a new iterative scheme is proposed considering those three tasks as(More)
This paper addresses face diarization in videos, that is, deciding which face appears and when in the video. To achieve this face-track clustering task, we propose a hierarchical approach combining the strength of two complementary measures: (i) a pairwise matching similarity relying on local interest points allowing the accurate clustering of faces tracks(More)
Audio-Visual People Diarization (AVPD) is an original framework that simultaneously improves audio, video, and audiovisual diarization results. Following a literature review of people diarization for both audio and video content and their limitations, which includes our own contributions, we describe a proposed method for associating both audio and video(More)
The IRIM group is a consortium of French teams working on Multimedia Indexing and Retrieval. This paper describes our participation to the TRECVID 2010 semantic indexing and instance search tasks. For the semantic indexing task, we evaluated a number of different descriptors and tried different fusion strategies, in particular hierarchical fusion. The best(More)
The process of manually labeling data is very expensive and sometimes infeasible due to privacy and security issues. This paper investigates the use of two algorithms for clustering unla-beled training i-vectors. This aims at improving speaker recognition performance by using state-of-the-art supervised techniques in the context of the NIST i-vector Machine(More)
In this paper, we describe a new application for multimedia indexing, using a system that monitors the instrumental activities of daily living to assess the cognitive decline caused by dementia. The system is composed of a wearable camera device designed to capture audio and video data of the instrumental activities of a patient, which is leveraged with(More)
! A_CU-run6: local feature alone – average fusion of 3 SVM classification results for each concept using various feature representation choices. ! A_CU-run5: linear weighted fusion of A_CU-run6 with two grid-based global features (color moment and wavelet texture). ! A_CU-run4: linear weighted fusion of A_CU-run5 with a SVM classification result using(More)
I4U is a joint entry of nine research Institutes and Universities across 4 continents to NIST SRE 2012. It started with a brief discussion during the Odyssey 2012 workshop in Singapore. An online discussion group was soon set up, providing a discussion platform for different issues surrounding NIST SRE'12. Noisy test segments, uneven multi-session training,(More)
This paper presents the LIUM open-source speaker diarization toolbox, mostly dedicated to broadcast news. This tool includes both Hierarchical Agglomerative Clustering using well-known measures such as BIC and CLR, and the new ILP clustering algorithm using i-vectors. Diarization systems are tested on the French evaluation data from ESTER, ETAPE and REPERE(More)