Rep Works in Speaker Verification
@article{Ma2021RepWI, title={Rep Works in Speaker Verification}, author={Yufeng Ma and Miao Zhao and Yiwei Ding and Yu Zheng and Min Liu and Minqiang Xu}, journal={ArXiv}, year={2021}, volume={abs/2110.09720} }
Multi-branch convolutional neural network architecture has raised lots of attention in speaker verification since the aggregation of multiple parallel branches can significantly improve performance. However, this design is not efficient enough during the inference time due to the increase of model parameters and extra operations. In this paper, we present a new multi-branch network architecture RepSPKNet that uses a re-parameterization technique. With this technique, our backbone model contains…
One Citation
TMS: A Temporal Multi-scale Backbone Design for Speaker Embedding
- Computer ScienceArXiv
- 2022
This paper proposes an effective temporal multi-scale (TMS) model where multi- scale branches could be efficiently designed in a speaker embedding network almost without increasing computational costs and develops a systemic reparameterization method to convert the multi-branch network topology into a single-path-based topology in order to increase inference speed.
References
SHOWING 1-10 OF 28 REFERENCES
Serialized Multi-Layer Multi-Head Attention for Neural Speaker Embedding
- Computer ScienceInterspeech
- 2021
The proposed serialized multi-layer multi-head attention is designed to aggregate and propagate attentive statistics from one layer to the next in a serialized manner and outperforms other baseline methods by 9.7% in EER and 8.1% in DCF10−2.
X-Vectors: Robust DNN Embeddings for Speaker Recognition
- Computer Science2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2018
This paper uses data augmentation, consisting of added noise and reverberation, as an inexpensive method to multiply the amount of training data and improve robustness of deep neural network embeddings for speaker recognition.
BUT System Description to VoxCeleb Speaker Recognition Challenge 2019
- Computer ScienceArXiv
- 2019
The submission of Brno University of Technology (BUT) team to the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2019 is described, a fusion of 4 Convolutional Neural Network (CNN) topologies and the best systems for Fixed and Open conditions achieved 1.42% and 1.26% ERR on the challenge evaluation set respectively.
Diverse Branch Block: Building a Convolution as an Inception-like Unit
- Computer Science2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2021
A universal building block of Convolutional Neural Network (ConvNet) named Diverse Branch Block (DBB), which enhances the representational capacity of a single convolution by combining diverse branches of different scales and complexities to enrich the feature space, including sequences of convolutions, multiscale convolution, and average pooling.
VoxCeleb: A Large-Scale Speaker Identification Dataset
- Computer ScienceINTERSPEECH
- 2017
This paper proposes a fully automated pipeline based on computer vision techniques to create a large scale text-independent speaker identification dataset collected 'in the wild', and shows that a CNN based architecture obtains the best performance for both identification and verification.
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
- Computer ScienceAAAI
- 2017
Very deep convolutional networks have been central to the largest advances in image recognition performance in recent years. One example is the Inception architecture that has been shown to achieve…
The IDLAB VoxCeleb Speaker Recognition Challenge 2020 System Description
- Computer ScienceArXiv
- 2020
This technical report describes the IDLAB top-scoring submissions for the VoxCeleb Speaker Recognition Challenge 2020 (VoxSRC-20) in the supervised and unsupervised speaker verification tracks with a large margin fine-tuning strategy.
VoxCeleb2: Deep Speaker Recognition
- Computer ScienceINTERSPEECH
- 2018
A very large-scale audio-visual speaker recognition dataset collected from open-source media is introduced and Convolutional Neural Network models and training strategies that can effectively recognise identities from voice under various conditions are developed and compared.
A study on data augmentation of reverberant speech for robust speech recognition
- Computer Science2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2017
It is found that the performance gap between using simulated and real RIRs can be eliminated when point-source noises are added, and the trained acoustic models not only perform well in the distant- talking scenario but also provide better results in the close-talking scenario.
Front-End Factor Analysis for Speaker Verification
- Computer ScienceIEEE Transactions on Audio, Speech, and Language Processing
- 2011
An extension of the previous work which proposes a new speaker representation for speaker verification, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis, named the total variability space because it models both speaker and channel variabilities.