Real Additive Margin Softmax for Speaker Verification

  title={Real Additive Margin Softmax for Speaker Verification},
  author={Lantian Li and Ruiqian Nai and Dong Wang},
The additive margin softmax (AM-Softmax) loss has delivered remarkable performance in speaker verification. A supposed behavior of AM-Softmax is that it can shrink within-class variation by putting emphasis on target logits, which in turn improves margin between target and non-target classes. In this paper, we conduct a careful analysis on the behavior of AM-Softmax loss, and show that this loss does not implement real max-margin training. Based on this observation, we present a Real AM-Softmax… 

Tables from this paper


Large Margin Softmax Loss for Speaker Verification
Ring loss and minimum hyperspherical energy criterion are introduced to further improve the performance of the large margin softmax loss with different configurations in speaker verification.
Max-margin metric learning for speaker recognition
Experiments conducted on the SRE08 core test show that compared to PLDA, the new approach can obtain comparable or even better performance, though the scoring is simply a cosine computation.
Angular Margin Centroid Loss for Text-Independent Speaker Recognition
This paper optimize the cosine distances between speaker embeddings and their corresponding centroids rather than the weight vectors in the classification layer to enhance the intra-class compactness of speaker embedding and explicitly improve the inter-class separability.
Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition
Three different margin based losses which not only separate classes but also demand a fixed margin between classes are introduced to deep speaker embedding learning and it could be demonstrated that the margin is the key to obtain more discriminative speaker embeddings.
Gaussian-constrained Training for Speaker Verification
A Gaussian-constrained training approach that discards the parametric classifier, and enforces the distribution of the derived speaker vectors to be Gaussian, leading to consistent performance improvement.
X-Vectors: Robust DNN Embeddings for Speaker Recognition
This paper uses data augmentation, consisting of added noise and reverberation, as an inexpensive method to multiply the amount of training data and improve robustness of deep neural network embeddings for speaker recognition.
Additive Margin Softmax for Face Verification
A conceptually simple and intuitive learning objective function, i.e., additive margin softmax, for face verification, which performs better when the evaluation criteria are designed for very low false alarm rate.
Deep Discriminative Embeddings for Duration Robust Speaker Verification
A novel algorithm to learn more discriminative utterance-level embeddings based on the Inception-ResNet speaker classifier is proposed, which outperforms ivector/PLDA framework for short utterances and is effective for long utterances.
Unified Hypersphere Embedding for Speaker Recognition
Results of experiments suggest that simple repetition and random time-reversion of utterances can reduce prediction errors by up to 18% and proposed logistic margin loss function leads to unified embeddings with state-of-the-art identification and competitive verification accuracies.
Generalized End-to-End Loss for Speaker Verification
A new loss function called generalized end-to-end (GE2E) loss is proposed, which makes the training of speaker verification models more efficient than the previous tuple-based end- to- end (TE2e) loss function.