• Corpus ID: 226260577

Speaker information modification in the VoicePrivacy 2020 toolchain

@inproceedings{Champion2020SpeakerIM,
  title={Speaker information modification in the VoicePrivacy 2020 toolchain},
  author={Pierre Champion and Denis Jouvet and Anthony Larcher},
  year={2020}
}
This paper presents a study of the baseline system of the VoicePrivacy 2020 challenge. This baseline relies on a voice conversion system that aims at separating speaker identity and linguistic contents for a given speech utterance. To generate an anonymized speech waveform, the neural acoustic model and neural waveform model use the related linguistic content together with a selected pseudo-speaker identity. The linguistic content is estimated using bottleneck features extracted from a triphone… 

Figures and Tables from this paper

Language-Independent Speaker Anonymization Approach using Self-Supervised Pre-Trained Models
TLDR
This paper proposes a simpler self-supervised learning (SSL)-based method for language-independent speaker anonymization without any explicit language-dependent model, which can be easily used for other languages.
Speaker Anonymization with Phonetic Intermediate Representations
TLDR
This work proposes a speaker anonymization pipeline that leverages high quality automatic speech recognition and synthesis systems to generate speech conditioned on phonetic transcriptions and anonymized speaker embeddings, and outperforms baselines provided in the Voice Privacy Challenge 2020 in terms of privacy robustness against a lazy-informed attacker.
Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions
TLDR
It was found that the domain (e.g., language and channel) mismatch between the training and test data affected the neural waveform vocoder and anonymized speaker vectors, which limited the performance of the whole system.
Differentially Private Speaker Anonymization
TLDR
Experimental results show that the generated utterances retain very high utility for automatic speech recognition training and inference, while being much better protected against strong adversaries who leverage the full knowledge of the anonymization process to try to infer the speaker identity.
Exploring the Importance of F0 Trajectories for Speaker Anonymization using X-vectors and Neural Waveform Models
TLDR
Modifying the F0 can improve speaker anonymization by as much as 8% with minor word-error rate degradation, according to the VoicePrivacy Challenge 2020 framework and datasets developed and evaluated.
Self-Supervised Speech Representations Preserve Speech Characteristics while Anonymizing Voices
TLDR
Several voice conversion models using self-supervised speech representations including Wav2Vec2.0, Hubert and UniSpeech are trained to be used as a method for anonymizing voices for discriminating between healthy and pathological speech.
Supplementary material to the paper The VoicePrivacy 2020 Challenge: Results and findings
TLDR
The VoicePrivacy 2020 Challenge focuses on developing anonymization solutions for speech technology and objective evaluation results for speaker verifiability, speech naturalness, and speech intelligibility are presented.
The VoicePrivacy 2020 Challenge: Results and findings

References

SHOWING 1-10 OF 28 REFERENCES
Speaker Anonymization Using X-vector and Neural Waveform Models
TLDR
A new approach to speaker anonymization is presented, which exploits state-of-the-art x-vector speaker representations and uses them to derive anonymized pseudo speaker identities through the combination of multiple, random speaker x-vectors.
Privacy-Preserving Adversarial Representation Learning in ASR: Reality or Illusion?
TLDR
The extent to which users can be recognized based on the encoded representation of their speech as obtained by a deep encoder-decoder architecture trained for ASR is studied and adversarial training is proposed to learn representations that perform well in ASR while hiding speaker identity.
Speaker Invariant Feature Extraction for Zero-Resource Languages with Adversarial Learning
TLDR
A model-independent feature extraction based on a neural network is introduced to obtain a speaker invariant feature for zero-resource languages to make bottleneck feature invariant to a change of speakers.
To Reverse the Gradient or Not: an Empirical Comparison of Adversarial and Multi-task Learning in Speech Recognition
TLDR
The results show that deep models trained on big datasets already develop invariant representations to speakers without any auxiliary loss, and models trained in a semi-supervised manner can improve error-rates.
Phonetic posteriorgrams for many-to-one voice conversion without parallel data training
This paper proposes a novel approach to voice conversion with non-parallel training data. The idea is to bridge between speakers by means of Phonetic PosteriorGrams (PPGs) obtained from a
X-Vectors: Robust DNN Embeddings for Speaker Recognition
TLDR
This paper uses data augmentation, consisting of added noise and reverberation, as an inexpensive method to multiply the amount of training data and improve robustness of deep neural network embeddings for speaker recognition.
Learning Anonymized Representations with Adversarial Neural Networks
TLDR
A novel training objective for simultaneously training a predictor over target variables of interest (the regular labels) while preventing an intermediate representation to be predictive of the private labels is introduced.
Non-Parallel Voice Conversion Using Variational Autoencoders Conditioned by Phonetic Posteriorgrams and D-Vectors
TLDR
Experimental results demonstrate that PPGs successfully improve both naturalness and speaker similarity of the converted speech, and both speaker codes and d-vectors can be adopted to the VAE-based many-to-many non-parallel VC.
Introducing the VoicePrivacy Initiative
TLDR
The voice anonymization task selected for the VoicePrivacy 2020 Challenge is formulated and the datasets used for system development and evaluation are described, including two anonymization baselines and objective evaluation results.
...
...