Semi-supervised voice conversion with amortized variational inference

  title={Semi-supervised voice conversion with amortized variational inference},
  author={Cory Stephenson and Gokce Keskin and Anil Thomas and Oguz H. Elibol},
In this work we introduce a semi-supervised approach to the voice conversion problem, in which speech from a source speaker is converted into speech of a target speaker. The proposed method makes use of both parallel and non-parallel utterances from the source and target simultaneously during training. This approach can be used to extend existing parallel data voice conversion systems such that they can be trained with semi-supervision. We show that incorporating semi-supervision improves the… 

Figures and Tables from this paper

Voice Conversion Based on Deep Neural Networks for Time-Variant Linear Transformations

A novel framework of voice conversion is described to improve the conversion performance versus the amount of training data, and a top-down knowledge is introduced into models as prior to take advantage of top- down knowledge the authors have instead of preparing a large amount of data.



Non-parallel voice conversion using i-vector PLDA: towards unifying speaker verification and transformation

This work adopts probabilistic linear discriminant analysis (PLDA) for voice conversion and adopts i-vector method to voice conversion, which requires neither parallel utterances, transcriptions nor time alignment procedures at any stage.

Semi-supervised training of a voice conversion mapping function using a joint-autoencoder

This study proposes a novel StackedJoint-Autoencoder (SJAE) architecture, which aims to find a common encoding of parallel source and target features to increase conversion performance.

Non-parallel training for voice conversion by maximum likelihood constrained adaptation

This work proposes a voice conversion method that does not require a parallel corpus for training, and shows that adaptation reduces the error obtained when simply applying the conversion parameters of one pair of speakers to another by a factor that can reach 30% in many cases.

INCA Algorithm for Training Voice Conversion Systems From Nonparallel Corpora

This paper proposes a new iterative alignment method that allows pairing phonetically equivalent acoustic vectors from nonparallel utterances from different speakers, even under cross-lingual conditions, and it does not require any phonetic or linguistic information.

The Voice Conversion Challenge 2018: Promoting Development of Parallel and Nonparallel Methods

A brief summary of the state-of-the-art techniques for VC is presented, followed by a detailed explanation of the challenge tasks and the results that were obtained.

Voice conversion using deep neural networks with speaker-independent pre-training

In this study, we trained a deep autoencoder to build compact representations of short-term spectra of multiple speakers. Using this compact representation as mapping features, we then trained an

An overview of voice conversion systems

The Voice Conversion Challenge 2016

The design of the challenge, its result, and a future plan to share views about unsolved problems and challenges faced by the current VC techniques are summarized.

Voice conversion using deep Bidirectional Long Short-Term Memory based Recurrent Neural Networks

This paper proposes a sequence-based conversion method using DBLSTM-RNNs to model not only the frame-wised relationship between the source and the target voice, but also the long-range context-dependencies in the acoustic trajectory.

Voice conversion in high-order eigen space using deep belief nets

This paper presents a voice conversion technique using Deep Belief Nets (DBNs) to build high-order eigen spaces of the source/target speakers, where it is easier to convert the source speech to the