Robust Low Rate Speech Coding Based on Cloned Networks and Wavenet

  title={Robust Low Rate Speech Coding Based on Cloned Networks and Wavenet},
  author={Felicia S. C. Lim and W. Kleijn and Michael Chinen and Jan Skoglund},
  journal={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  • Felicia S. C. Lim, W. Kleijn, J. Skoglund
  • Published 1 May 2020
  • Computer Science
  • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Rapid advances in machine-learning based generative modeling of speech make its use in speech coding attractive. However, the current performance of such models drops rapidly with noise contamination of the input, preventing use in practical applications. We present a new speech-coding scheme that is based on features that are robust to the distortions occurring in speech-coder input signals. To this purpose, we encourage the feature encoder to provide the same independent features for each of… 

Figures from this paper

Generative Speech Coding with Predictive Variance Regularization
  • W. Kleijn, Andrew Storus, Hengchin Yeh
  • Computer Science
    ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2021
This work introduces predictive-variance regularization to reduce the sensitivity to outliers and provides extensive subjective performance evaluations that show that the system based on generative modeling provides state-of-the-art coding performance at 3 kb/s for real-world speech signals at reasonable computational complexity.
Enhancing into the Codec: Noise Robust Speech Coding with Vector-Quantized Autoencoders
Audio codecs based on discretized neural autoencoders have recently been developed and shown to provide significantly higher compression levels for comparable quality speech out-put. However, these
Discretization and Re-synthesis: an alternative method to solve the Cocktail Party Problem
Evaluation results based on the WSJ0-2mix and VCTKnoisy corpora in various settings show that the proposed method can steadily synthesize the separated speech with high speech quality and without any interference, which is difficult to avoid in regressionbased methods.
A Codec Simulation for Low-rate Speech Coding with Radial Neural Networks
The simulation of three-channel transmission system shows that the application of a radial neural network as part of a predictive codec can reduce the dynamic range of signals by up to 70% while maintaining the communication quality for each of the used types of signal.
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations
To generate disentangled representation, low-bitrate representations are extracted for speech content, prosodic information, and speaker identity to synthesize speech in a controllable manner using self-supervised discrete representations.
Source-Aware Neural Speech Coding for Noisy Speech Compression
The proposed source-aware neural audio coding (SANAC) system harmonizes a deep autoencoder-based source separation model and a neural coding system, so that it can explicitly perform source separation and coding in the latent space.


Wavenet Based Low Rate Speech Coding
This work describes how a WaveNet generative speech model can be used to generate high quality speech from the bit stream of a standard parametric coder operating at 2.4 kb/s and shows that the speech produced by the system is able to additionally perform implicit bandwidth extension and does not significantly impair recognition of the original speaker for the human listener.
Low Bit-rate Speech Coding with VQ-VAE and a WaveNet Decoder
This work demonstrates that a neural network architecture based on VQ-VAE with a WaveNet decoder can be used to perform very low bit-rate speech coding with high reconstruction quality.
WaveNet-Based Zero-Delay Lossless Speech Coding
Experimental results show that the proposed coding technique can transmit speech audio waveforms with 50% their original bit rate and the WaveNet-based speech coder remains effective for unknown speakers.
WaveNet: A Generative Model for Raw Audio
WaveNet, a deep neural network for generating raw audio waveforms, is introduced; it is shown that it can be efficiently trained on data with tens of thousands of samples per second of audio, and can be employed as a discriminative model, returning promising results for phoneme recognition.
Encoding speech using prototype waveforms
  • W. Kleijn
  • Engineering
    IEEE Trans. Speech Audio Process.
  • 1993
The coding method is easily combined with existing LP-based speech coders, such as CELP, for unvoiced signals and excellent voiced speech quality is obtained at rates between 3.0 and 4.0 kb/s.
A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet
It is demonstrated that LPCNet operating at 1.6 kb/s achieves significantly higher quality than MELP and that uncompressed LPC net can exceed the quality of a waveform codec operating at low bitrate, opening the way for new codec designs based on neural synthesis models.
Efficient Neural Audio Synthesis
A single-layer recurrent neural network with a dual softmax layer that matches the quality of the state-of-the-art WaveNet model, the WaveRNN, and a new generation scheme based on subscaling that folds a long sequence into a batch of shorter sequences and allows one to generate multiple samples at once.
High-quality Speech Coding with Sample RNN
We provide a speech coding scheme employing a generative model based on SampleRNN that, while operating at significantly lower bitrates, matches or surpasses the perceptual quality of
Adaptive predictive coding of speech signals
Preliminary studies suggest that the binary difference signal and the predictor parameters together can be transmitted at approximately 10 kilobits/second which is several times less than the bit rate required for log-PCM encoding with comparable speech quality.
Speaker-dependent Wavenet-based Delay-free Adpcm Speech Coding
WaveNet-based delay-free adaptive differential pulse code modulation (ADPCM) speech coding system, which is a state-of-the-art model for neural-network-based speech waveform synthesis, is proposed to improve speech quality and outperformed the conventional ADPCM system based on ITU-T Recommendation G.726.