An Unsupervised Speaker Clustering Technique based on SOM and I-vectors for Speech Recognition Systems

@inproceedings{Ahmed2017AnUS,
  title={An Unsupervised Speaker Clustering Technique based on SOM and I-vectors for Speech Recognition Systems},
  author={Hany Ahmed and Mohamed S. Elaraby and Abdullah M. Moussa and Mostafa Elhosiny and Sherif M. Abdou and Mohsen A. A. Rashwan},
  booktitle={WANLP@EACL},
  year={2017}
}
In this paper, we introduce an enhancement for speech recognition systems using an unsupervised speaker clustering technique. The proposed technique is mainly based on I-vectors and Self-Organizing Map Neural Network(SOM).The input to the proposed algorithm is a set of speech utterances. For each utterance, we extract 100-dimensional I-vector and then SOM is used to group the utterances to different speakers. In our experiments, we compared our technique with Normalized Cross Likelihood ratio… 

Figures and Tables from this paper

Intra-Speaker Variability Assessment for Speaker Recognition in Degraded Conditions: A Case of African Tone Languages
TLDR
An extensive study on intra-speaker variability is presented in this chapter, which presents the results obtained from a frame-by-frame analysis, principal component analysis (PCA), and self-organizing map (SOM) clustering and visualization on the extracted speech features.
Natural Language Processing: Speaker, Language, and Gender Identification with LSTM
TLDR
The main contribution of this paper is to achieve a high rate of speaker recognition for text-independent continuous speech using small ratio of training to test data, by applying long short-term memory recursive neural network.
Speaker Identification for Japanese Prefectural Assembly Minutes
Recently, we have been creating a corpus of Japanese prefectural assembly minutes. The corpus contains assembly minutes of all 47 prefectures between April 2011 and March 2015. This four-year period
Anomaly Detection through Transfer Learning in Agriculture and Manufacturing IoT Systems
TLDR
It is shown how in these two application domains, predictive failure classification can be achieved, thus paving the way for predictive maintenance.

References

SHOWING 1-10 OF 14 REFERENCES
Front-End Factor Analysis for Speaker Verification
TLDR
An extension of the previous work which proposes a new speaker representation for speaker verification, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis, named the total variability space because it models both speaker and channel variabilities.
Speaker segmentation and clustering
The ICSI RT07s Speaker Diarization System
TLDR
This paper used the most recent available version of the beam-forming toolkit, implemented a new speech/non-speech detector that does not require models trained on meeting data and performed the development on a much larger set of recordings.
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
TLDR
This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.
Speaker diarization using normalized cross likelihood ratio
TLDR
The Normalized Cross Likelihood Ratio is used as a dissimilarity measure between two Gaussian speaker models in the speaker change detection step and its contribution to the performance of speakers change detection is compared with those of BIC and Hostelling's T2-Statistic measures.
A review on speaker diarization systems and approaches
Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion
TLDR
The segmentation algorithm can successfully detect acoustic changes; the clustering algorithm can produce clusters with high purity, leading to improvements in accuracy through unsupervised adaptation as much as the ideal clustering by the true speaker identities.
Robust Speaker Diarization for meetings
TLDR
Four of the main improvements to the ICSI speaker diarization system submitted for the NIST Rich Transcription evaluation (RT06s) conducted on the meetings environment are introduced: a new training-free speech/non-speech detection algorithm, the introduction of a new algorithm for system initialization, and a frame purification algorithm to increase clusters differentiability.
A Study of Interspeaker Variability in Speaker Verification
TLDR
It is shown that when a large joint factor analysis model is trained in this way and tested on the core condition, the extended data condition and the cross-channel condition, it is capable of performing at least as well as fusions of multiple systems of other types.
The Kaldi Speech Recognition Toolkit
TLDR
The design of Kaldi is described, a free, open-source toolkit for speech recognition research that provides a speech recognition system based on finite-state automata together with detailed documentation and a comprehensive set of scripts for building complete recognition systems.
...
1
2
...