Deeply Fused Speaker Embeddings for Text-Independent Speaker Verification
@inproceedings{Bhattacharya2018DeeplyFS, title={Deeply Fused Speaker Embeddings for Text-Independent Speaker Verification}, author={Gautam Bhattacharya and Md. Jahangir Alam and Vishwa Gupta and Patrick Kenny}, booktitle={INTERSPEECH}, year={2018} }
Recently there has been a surge of interest is learning speaker embeddings using deep neural networks. These models ingest time-frequency representations of speech and can be trained to discriminate between a known set speakers. While embeddings learned in this way perform well, they typically require a large number of training data points for learning. In this work we propose deeply fused speaker embeddings speaker representations that combine neural speaker embeddings with i-vectors. We show…
8 Citations
Generative Adversarial Speaker Embedding Networks for Domain Robust End-to-end Speaker Verification
- Computer ScienceICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2019
A novel approach for learning domain-invariant speaker embeddings using Generative Adversarial Networks, able to match the performance of a strong baseline x-vector system and significantly boost verification performance by averaging the different GAN models at the score level.
Adapting End-to-end Neural Speaker Verification to New Languages and Recording Conditions with Adversarial Training
- Computer ScienceICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2019
This article applies speaker embeddings to the task of text-independent speaker verification, a challenging, real-world problem in biometric security by combing a novel 1-dimensional, self-attentive residual network, an angular margin loss function and adversarial training strategy.
SpeakerGAN: Recognizing Speakers in New Languages with Generative Adversarial Networks
- Computer Science
- 2018
This work presents a flexible and interpretable framework for learning domain invariant speaker embeddings using Generative Adversarial Networks and shows that proposed adversarial speaker embedding models significantly reduce the distance between source and target data distributions, while performing similarly on the former and better on the latter.
Speaker Diarisation Using 2D Self-attentive Combination of Embeddings
- Computer ScienceICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2019
A generic framework to improve performance by combining them into a single embedding, referred to as a c-vector, is proposed, which extends the standard self-attentive layer by averaging not only across time but also across different types of embeddings.
An Improved Deep Neural Network for Modeling Speaker Characteristics at Different Temporal Scales
- Computer ScienceICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020
An improved deep embedding learning method based on a convolutional neural network (CNN) for text-independent speaker verification and a Baum-Welch statistics attention (BWSA) mechanism is applied in the pooling layer, which can integrate more useful long-term speaker characteristics in the temporal pooling layers.
Ensemble Additive Margin Softmax for Speaker Verification
- Computer ScienceICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2019
Experiments on a large-scale dataset VoxCeleb show that AM-Soft max loss is better than traditional loss functions, and approaches using EAM-Softmax loss can outperform existing speaker verification methods to achieve state-of-the-art performance.
References
SHOWING 1-10 OF 27 REFERENCES
Deep Neural Network Embeddings for Text-Independent Speaker Verification
- Computer ScienceINTERSPEECH
- 2017
It is found that the embeddings outperform i-vectors for short speech segments and are competitive on long duration test conditions, which are the best results reported for speaker-discriminative neural networks when trained and tested on publicly available corpora.
Deep Speaker: an End-to-End Neural Speaker Embedding System
- Computer Science, PhysicsArXiv
- 2017
Results that suggest adapting from a model trained with Mandarin can improve accuracy for English speaker recognition are presented, and it is suggested that Deep Speaker outperforms a DNN-based i-vector baseline.
X-Vectors: Robust DNN Embeddings for Speaker Recognition
- Computer Science2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2018
This paper uses data augmentation, consisting of added noise and reverberation, as an inexpensive method to multiply the amount of training data and improve robustness of deep neural network embeddings for speaker recognition.
Deep neural network-based speaker embeddings for end-to-end speaker verification
- Computer Science2016 IEEE Spoken Language Technology Workshop (SLT)
- 2016
It is shown that given a large number of training speakers, the proposed system outperforms an i-vector baseline in equal error-rate (EER) and at low miss rates.
Deep Neural Network based Text-Dependent Speaker Recognition: Preliminary Results
- Computer Science
- 2016
While the DNN models outperform the RNN, both models perform poorly compared to a GMM-UBM system, which serves as motivation for the further development of neural network based speaker verification approaches using global features.
Deep Speaker Feature Learning for Text-Independent Speaker Verification
- Computer ScienceINTERSPEECH
- 2017
This paper presents a convolutional time-delay deep neural network structure (CT-DNN) for speaker feature learning that can produce high-quality speaker features and confirmed that the speaker trait is largely a deterministic short-time property rather than a long-time distributional pattern, and therefore can be extracted from just dozens of frames.
Front-End Factor Analysis For Speaker Verification
- Computer Science2018 International Conference on Communications (COMM)
- 2018
This paper investigates which configuration and which parameters lead to the best performance of an i-vectors/PLDA based speaker verification system and presents at the end some preliminary experiments in which the utterances comprised in the CSTR VCTK corpus were used besides utterances from MIT-MDSVC for training the total variability covariance matrix and the underlying PLDA matrices.
End-to-end text-dependent speaker verification
- Computer Science2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2016
In this paper we present a data-driven, integrated approach to speaker verification, which maps a test utterance and a few reference utterances directly to a single score for verification and jointly…
Improving DNN speaker independence with I-vector inputs
- Computer Science2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2014
Modifications of the basic algorithm are developed which result in significant reductions in word error rates (WERs), and the algorithms are shown to combine well with speaker adaptation by backpropagation, resulting in a 9% relative WER reduction.
FaceNet: A unified embedding for face recognition and clustering
- Computer Science2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2015
A system that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure offace similarity, and achieves state-of-the-art face recognition performance using only 128-bytes perface.