Speaker diarization using one-class support vector machines
This paper proposes two algorithms for the task of 2-speaker unsupervised clustering. The first one creates two SVM models, one for each speaker. The second creates only one SVM model, being each speaker assigned to each class of the same model. These clustering algorithms are based on traditional two-classes SVM and use MLSF coefficients as acoustic features to represent the speakers. Tests were conducted in the audio stream of two interview videos in Portuguese, each one with two male speakers. Results must be considered as preliminary but if the speech segmentation was well conceived no errors were found.