Corpus ID: 237502743

Self-Supervised Metric Learning With Graph Clustering For Speaker Diarization

  title={Self-Supervised Metric Learning With Graph Clustering For Speaker Diarization},
  author={Prachi Singh and Sriram Ganapathy},
  • Prachi Singh, S. Ganapathy
  • Published 14 September 2021
  • Computer Science, Engineering
  • ArXiv
In this paper, we propose a novel algorithm for speaker diarization using metric learning for graph based clustering. The graph clustering algorithms use an adjacency matrix consisting of similarity scores. These scores are computed between speaker embeddings extracted from pairs of audio segments within the given recording. In this paper, we propose an approach that jointly learns the speaker embeddings and the similarity metric using principles of self-supervised learning. The metric learning… Expand

Figures and Tables from this paper


LSTM based Similarity Measurement with Spectral Clustering for Speaker Diarization
A supervised method to measure the similarity matrix between all segments of an audio recording with sequential bidirectional long short-term memory networks (Bi-LSTM), which significantly outperforms the state-of-the-art methods and achieves a diarization error rate below average. Expand
Speaker diarization with plda i-vector scoring and unsupervised calibration
A system that incorporates probabilistic linear discriminant analysis (PLDA) for i-vector scoring and uses unsupervised calibration of the PLDA scores to determine the clustering stopping criterion is proposed, and it is shown that PLDA scoring outperforms the same system with cosine scoring, and that overlapping segments reduce diarization error rate (DER) as well. Expand
A spectral clustering approach to speaker diarization
To apply the Ng-Jordan-Weiss (NJW) spectral clustering algorithm to speaker diarization, some domain specific solutions to the open issues of this algorithm are proposed: choice of metric; selection of scaling parameter; estimation of the number of clusters. Expand
Fully Supervised Speaker Diarization
A fully supervised speaker diarization approach, named unbounded interleaved-state recurrent neural networks (UIS-RNN), given extracted speaker-discriminative embeddings, which decodes in an online fashion while most state-of-the-art systems rely on offline clustering. Expand
Speaker diarization of broadcast streams using two-stage clustering based on i-vectors and cosine distance scoring
  • J. Silovský, J. Prazak
  • Computer Science
  • 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2012
Improvement of the performance over the baseline system based on the Bayesian Information Criterion (BIC) is demonstrated and significant impact of cepstral mean normalization is highlighted. Expand
Speaker Diarization with LSTM
This work combines LSTM-based d-vector audio embeddings with recent work in nonparametric clustering to obtain a state-of-the-art speaker diarization system that achieves a 12.0% diarization error rate on NIST SRE 2000 CALLHOME, while the model is trained with out- of-domain data from voice search logs. Expand
SpectralNet: Spectral Clustering using Deep Neural Networks
A deep learning approach to spectral clustering that overcomes the major limitations of scalability and generalization of the spectral embedding and applies VC dimension theory to derive a lower bound on the size of SpectralNet. Expand
Online speaker diarization using adapted i-vector transforms
  • W. Zhu, Jason W. Pelecanos
  • Computer Science
  • 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2016
This paper proposes a novel Maximum a Posteriori (MAP) adapted transform within an i-vector speaker diarization framework, that operates in a strict left-to-right fashion. Expand
Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: theory, implementation and analysis on standard tasks
This work shows that VBx achieves superior performance on three of the most popular datasets for evaluating diarization: CALLHOME, AMI and DIHARDII datasets and presents for the first time the derivation and update formulae for the VBX model. Expand
LEAP Diarization System for the Second DIHARD Challenge
A modified VB-HMM model with posterior scaling which provides significant improvements in the final diarization error rate (DER) and an analysis performed using the proposed posterior scaling method shows that scaling results in improved discrimination among the HMM states in the VB -HMM. Expand