An evaluation of graph clustering methods for unsupervised term discovery
Short-time spectral characterizations of the human voice have proven to be the most dependable features available to modern speaker recognition systems. However, it is well-known that highlevel linguistic information such as word usage and pronunciation patterns can provide complementary discriminative power. In an automatic setting, the availability of these idiolectal cues is dependent on access to a word or phonetic tokenizer, ideally in the given language and domain. In this paper, we propose a novel approach to speaker recognition that leverages recently developed zero-resource term discovery algorithms to identify speaker-characteristic lexical and phrasal acoustic patterns without the need for any supervised speech recognition tools. We use the enrollment audio itself to score each trial and perform no model training (supervised or unsupervised) at any stage of the processing, allowing immediate application to any language or domain. We evaluate our approach on the extended 8-conversation core condition of the 2010 NIST SRE and demonstrate a 16% relative (0.06 absolute) reduction in minDCF when combined with a state-of-the-art unsupervised i-vector cosine system.