• Corpus ID: 239024602

Rep Works in Speaker Verification

@article{Ma2021RepWI,
  title={Rep Works in Speaker Verification},
  author={Yufeng Ma and Miao Zhao and Yiwei Ding and Yu Zheng and Min Liu and Minqiang Xu},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.09720}
}
  • Yufeng Ma, Miao Zhao, +3 authors Minqiang Xu
  • Published 19 October 2021
  • Computer Science, Engineering
  • ArXiv
Multi-branch convolutional neural network architecture has raised lots of attention in speaker verification since the aggregation of multiple parallel branches can significantly improve performance. However, this design is not efficient enough during the inference time due to the increase of model parameters and extra operations. In this paper, we present a new multi-branch network architecture RepSPKNet that uses a re-parameterization technique. With this technique, our backbone model contains… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 28 REFERENCES
Serialized Multi-Layer Multi-Head Attention for Neural Speaker Embedding
TLDR
The proposed serialized multi-layer multi-head attention is designed to aggregate and propagate attentive statistics from one layer to the next in a serialized manner and outperforms other baseline methods by 9.7% in EER and 8.1% in DCF10−2.
X-Vectors: Robust DNN Embeddings for Speaker Recognition
TLDR
This paper uses data augmentation, consisting of added noise and reverberation, as an inexpensive method to multiply the amount of training data and improve robustness of deep neural network embeddings for speaker recognition.
BUT System Description to VoxCeleb Speaker Recognition Challenge 2019
TLDR
The submission of Brno University of Technology (BUT) team to the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2019 is described, a fusion of 4 Convolutional Neural Network (CNN) topologies and the best systems for Fixed and Open conditions achieved 1.42% and 1.26% ERR on the challenge evaluation set respectively.
Diverse Branch Block: Building a Convolution as an Inception-like Unit
TLDR
A universal building block of Convolutional Neural Network (ConvNet) named Diverse Branch Block (DBB), which enhances the representational capacity of a single convolution by combining diverse branches of different scales and complexities to enrich the feature space, including sequences of convolutions, multiscale convolution, and average pooling.
VoxCeleb: A Large-Scale Speaker Identification Dataset
TLDR
This paper proposes a fully automated pipeline based on computer vision techniques to create a large scale text-independent speaker identification dataset collected 'in the wild', and shows that a CNN based architecture obtains the best performance for both identification and verification.
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
Very deep convolutional networks have been central to the largest advances in image recognition performance in recent years. One example is the Inception architecture that has been shown to achieve
The IDLAB VoxCeleb Speaker Recognition Challenge 2020 System Description
TLDR
This technical report describes the IDLAB top-scoring submissions for the VoxCeleb Speaker Recognition Challenge 2020 (VoxSRC-20) in the supervised and unsupervised speaker verification tracks with a large margin fine-tuning strategy.
VoxCeleb2: Deep Speaker Recognition
TLDR
A very large-scale audio-visual speaker recognition dataset collected from open-source media is introduced and Convolutional Neural Network models and training strategies that can effectively recognise identities from voice under various conditions are developed and compared.
A study on data augmentation of reverberant speech for robust speech recognition
TLDR
It is found that the performance gap between using simulated and real RIRs can be eliminated when point-source noises are added, and the trained acoustic models not only perform well in the distant- talking scenario but also provide better results in the close-talking scenario.
Front-End Factor Analysis for Speaker Verification
TLDR
An extension of the previous work which proposes a new speaker representation for speaker verification, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis, named the total variability space because it models both speaker and channel variabilities.
...
1
2
3
...