Speaker diarization using deep neural network embeddings

@article{GarciaRomero2017SpeakerDU,
  title={Speaker diarization using deep neural network embeddings},
  author={Daniel Garcia-Romero and David Snyder and Gregory Sell and Daniel Povey and Alan McCree},
  journal={2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2017},
  pages={4930-4934}
}
Speaker diarization is an important front-end for many speech technologies in the presence of multiple speakers, but current methods that employ i-vector clustering for short segments of speech are potentially too cumbersome and costly for the front-end role. In this work, we propose an alternative approach for learning representations via deep neural networks to remove the i-vector extraction process from the pipeline entirely. The proposed architecture simultaneously learns a fixed… CONTINUE READING