• Corpus ID: 231662263

A Study of F0 Modification for X-Vector Based Speech Pseudonymization Across Gender

@article{Champion2021ASO,
  title={A Study of F0 Modification for X-Vector Based Speech Pseudonymization Across Gender},
  author={Pierre Champion and Denis Jouvet and Anthony Larcher},
  journal={ArXiv},
  year={2021},
  volume={abs/2101.08478}
}
Speech pseudonymization aims at altering a speech signal to map the identifiable personal characteristics of a given speaker to another identity. In other words, it aims to hide the source speaker identity while preserving the intelligibility of the spoken content. This study takes place in the VoicePrivacy 2020 challenge framework, where the baseline system performs pseudonymization by modifying x-vector information to match a target speaker while keeping the fundamental frequency (F0… 

Figures and Tables from this paper

Exploring the Importance of F0 Trajectories for Speaker Anonymization using X-vectors and Neural Waveform Models
TLDR
Modifying the F0 can improve speaker anonymization by as much as 8% with minor word-error rate degradation, according to the VoicePrivacy Challenge 2020 framework and datasets developed and evaluated.
Evaluating X-Vector-Based Speaker Anonymization Under White-Box Assessment
TLDR
This article proposed to constrain the target selection to a specific identity, i.e., removing the random selection of identity, to evaluate the extreme threat under a white-box assessment (the attacker has complete knowledge about the system).
Differentially Private Speaker Anonymization
TLDR
Experimental results show that the generated utterances retain very high utility for automatic speech recognition training and inference, while being much better protected against strong adversaries who leverage the full knowledge of the anonymization process to try to infer the speaker identity.
A Tandem Framework Balancing Privacy and Security for Voice User Interfaces
TLDR
It is demonstrated that to effectively defend from potential attacks against VUIs, it is necessary to investigate the attacks from multiple complementary perspectives and carefully account for the effects of deploying countermeasures, pointing to several promising research directions.
The VoicePrivacy 2020 Challenge: Results and findings

References

SHOWING 1-10 OF 21 REFERENCES
Speaker Anonymization Using X-vector and Neural Waveform Models
TLDR
A new approach to speaker anonymization is presented, which exploits state-of-the-art x-vector speaker representations and uses them to derive anonymized pseudo speaker identities through the combination of multiple, random speaker x-vectors.
F0-Consistent Many-To-Many Non-Parallel Voice Conversion Via Conditional Autoencoder
TLDR
This work modified and improved autoencoder-based voice conversion to disentangle content, F0, and speaker identity at the same time and can control the F0 contour, generate speech with F0 consistent with the target speaker, and significantly improve quality and similarity.
Phonetic posteriorgrams for many-to-one voice conversion without parallel data training
This paper proposes a novel approach to voice conversion with non-parallel training data. The idea is to bridge between speakers by means of Phonetic PosteriorGrams (PPGs) obtained from a
Individuality-Preserving Spectrum Modification for Articulation Disorders Using Phone Selective Synthesis
TLDR
A Hidden Markov Model (HMM)-based text-to-speech synthesis approach that preserves the voice individuality of those with articulation disorders and aids them in their communication.
Design Choices for X-vector Based Speaker Anonymization
TLDR
A flexible pseudo-speaker selection technique is presented as a baseline for the first VoicePrivacy Challenge and several design choices for the distance metric between speakers, the region of x-vector space where the pseudo- Speaker is picked, and gender selection are explored.
X-Vectors: Robust DNN Embeddings for Speaker Recognition
TLDR
This paper uses data augmentation, consisting of added noise and reverberation, as an inexpensive method to multiply the amount of training data and improve robustness of deep neural network embeddings for speaker recognition.
Unsupervised Representation Disentanglement Using Cross Domain Features and Adversarial Learning in Variational Autoencoder Based Voice Conversion
TLDR
This article extends the CDVAE-VC framework by incorporating the concept of adversarial learning, in order to further increase the degree of disentanglement, thereby improving the quality and similarity of converted speech.
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
TLDR
Experimental results show that neural end-to-end TTS models trained from the LibriTTS corpus achieved above 4.0 in mean opinion scores in naturalness in five out of six evaluation speakers.
Application-independent evaluation of speaker detection
...
...