• Corpus ID: 15452240

TOWARDS ROBUST SPEAKER SEGMENTATION: THE ICSI-SRI FALL 2004 DIARIZATION SYSTEM

@inproceedings{Wooters2004TOWARDSRS,
  title={TOWARDS ROBUST SPEAKER SEGMENTATION: THE ICSI-SRI FALL 2004 DIARIZATION SYSTEM},
  author={Chuck Wooters and James G. Fung and Barbara Peskin and Xavier Sanahuja i Anguera},
  year={2004}
}
We describe the ICSI-SRI entry in the Fall 2004 DARPA EARS Metadata Evaluation. The current system was derived from ICSI’s Fall 2003 Speaker-attributed STT system. Our system is an agglomerative clustering system that uses a BIC-like measure to determine when to stop merging clusters and to decide which pairs of clusters to merge. The main advantage of this approach is that it does not require pre-trained acoustic models, providing robustness and portability. Changes for this year’s system… 

Figures and Tables from this paper

Robust Speaker Segmentation for Meetings: The ICSI-SRI Spring 2005 Diarization System
TLDR
This paper describes the ICSI-SRI entry in the Rich Transcription 2005 Spring Meeting Recognition Evaluation, and adds several features to the baseline clustering system, including a “purification” module that tries to keep the clusters acoustically homogeneous throughout the clustering process, and a delay&sum beamforming algorithm which enhances signal quality for the multiple distant microphones sub-task.
Improvements in Speaker Diarization System
TLDR
An automatic speaker diarization system for natural, multi-speaker meeting conversations using one central microphone that adapts the segment model from a Universal Background Model, and uses the cross-likelihood ratio instead of the Bayesian Information Criterion for merging.
FRAME PURIFICATION FOR CLUSTER COMPARISON IN SPEAKER DIARIZATION
TLDR
This paper presents one algorithm that aims to purify the clusters, eliminating the non-discriminant frames –selected using a likelihood-based metric– when comparing two clusters.
The ICSI RT07s Speaker Diarization System
TLDR
This paper used the most recent available version of the beam-forming toolkit, implemented a new speech/non-speech detector that does not require models trained on meeting data and performed the development on a much larger set of recordings.
An improved speaker diarization system
TLDR
This paper describes an automatic speaker diarization system for natural, multi-speaker meeting conversations that adapts the segment model from a universal background model, and uses the cross-likelihood ratio instead of the Bayesian Information Criterion for merging.
Robust Speaker Diarization for meetings
TLDR
Four of the main improvements to the ICSI speaker diarization system submitted for the NIST Rich Transcription evaluation (RT06s) conducted on the meetings environment are introduced: a new training-free speech/non-speech detection algorithm, the introduction of a new algorithm for system initialization, and a frame purification algorithm to increase clusters differentiability.
New insights into hierarchical clustering and linguistic normalization for speaker diarization
TLDR
A new top-down/bottom-up system combination outperforming the respective baseline system and a new technology able to limit the influence of linguistic effects, responsible for biasing the convergence of the diarization system are introduced.
Automatic Cluster Complexity and Quantity Selection: Towards Robust Speaker Diarization
TLDR
This paper presents three techniques to select the parameters individually for each case in agglomerative clustering, obtaining a system that is more robust to changes in the data.
The ICSI RT-09 Speaker Diarization System
TLDR
The first full conceptual description of the ICSI speaker diarization system as presented to the National Institute of Standards Technology Rich Transcription 2009 (NIST RT-09) evaluation is presented, which consists of online and offline subsystems, multi-stream and single-stream implementations, and audio and audio-visual approaches.
Speaker diarization of spontaneous meeting room conversations
TLDR
New features based on structure of a conversation such as silence and speaker change statistics for overlap detection and different artificial neural network architectures to extract speaker discriminant features and use these features as input to speaker diarization systems are proposed.
...
1
2
3
4
5
...

References

SHOWING 1-6 OF 6 REFERENCES
Improved Unknown-Multiple Speaker clustering using HMM
TLDR
Improvements over the previous work are reported and it is shown that, the system converges to right number of clusters in case of limited number of speakers.
Unknown-multiple speaker clustering using HMM
TLDR
It is shown that the number of clusters found often correspond to the actual number of speakers, and the effect of using only the features from highly voiced frames as a means of improving the robustness and computational complexity of the algorithm is examined.
Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion
TLDR
The segmentation algorithm can successfully detect acoustic changes; the clustering algorithm can produce clusters with high purity, leading to improvements in accuracy through unsupervised adaptation as much as the ideal clustering by the true speaker identities.
Robust HMM-based speech/music segmentation
TLDR
A new approach towards high performance speech/music segmentation on realistic tasks related to the automatic transcription of broadcast news by using a 2-state (speech and and non-speech) hidden Markov model with minimum duration constraints.
A robust speaker clustering algorithm
  • J. Ajmera, Chuck Wooters
  • Computer Science
    2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721)
  • 2003
TLDR
The algorithm automatically performs both speaker segmentation and clustering without any prior knowledge of the identities or the number of speakers and has the following advantages: no threshold adjustment requirements; no need for training/development data; and robustness to different data conditions.
XBIC: nueva medida para segmentación de locutor hacia el indexado automático de la señal de voz
La evolucion de la sociedad de la informacion ha traido consigo un incesante incremento de contenidos audiovisuales que normalmente se archivan en bases de datos multimedia por tal de poder ser