Unsupervised idiolect discovery for speaker recognition

Abstract

Short-time spectral characterizations of the human voice have proven to be the most dependable features available to modern speaker recognition systems. However, it is well-known that highlevel linguistic information such as word usage and pronunciation patterns can provide complementary discriminative power. In an automatic setting, the availability of these idiolectal cues is dependent on access to a word or phonetic tokenizer, ideally in the given language and domain. In this paper, we propose a novel approach to speaker recognition that leverages recently developed zero-resource term discovery algorithms to identify speaker-characteristic lexical and phrasal acoustic patterns without the need for any supervised speech recognition tools. We use the enrollment audio itself to score each trial and perform no model training (supervised or unsupervised) at any stage of the processing, allowing immediate application to any language or domain. We evaluate our approach on the extended 8-conversation core condition of the 2010 NIST SRE and demonstrate a 16% relative (0.06 absolute) reduction in minDCF when combined with a state-of-the-art unsupervised i-vector cosine system.

DOI: 10.1109/ICASSP.2014.6853883

Extracted Key Phrases

3 Figures and Tables

Cite this paper

@article{Jansen2014UnsupervisedID, title={Unsupervised idiolect discovery for speaker recognition}, author={Aren Jansen and Daniel Garcia-Romero and Pascal Clark and Jaime Hernandez-Cordero}, journal={2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, year={2014}, pages={1675-1679} }