NASTAR: Noise Adaptive Speech Enhancement with Target-Conditional Resampling

  title={NASTAR: Noise Adaptive Speech Enhancement with Target-Conditional Resampling},
  author={Chi-Chang Lee and Cheng‐Hung Hu and Yu-Chen Lin and Chu-Song Chen and Hsin-Min Wang and Yu Tsao},
For deep learning-based speech enhancement (SE) systems, the training-test acoustic mismatch can cause notable performance degradation. To address the mismatch issue, numerous noise adap- tation strategies have been derived. In this paper, we propose a novel method, called noise adaptive speech enhancement with target-conditional resampling (NASTAR), which reduces mismatches with only one sample (one-shot) of noisy speech in the target environment. NASTAR uses a feedback mechanism to simulate… 

Figures and Tables from this paper

Dynamic Noise Embedding: Noise Aware Training and Adaptation for Speech Enhancement

To estimate noise-only frames, voice activity detection (VAD) is employed to detect non-speech frames by applying optimal threshold on speech posterior and these estimated frames are used to extract dynamic noise embedding (DNE), which is useful for an SE module to capture the characteristic of background noise.

Noise Tokens: Learning Neural Noise Templates for Environment-Aware Speech Enhancement

Experimental results show that using NTs is an effective strategy that consistently improves the generalization ability of SE systems across different DNN architectures, and investigates applying a state-of-the-art neural vocoder to generate waveform instead of traditional inverse STFT (ISTFT).

Noise Adaptive Speech Enhancement using Domain Adversarial Training

A novel noise adaptive speech enhancement (SE) system, which employs a domain adversarial training (DAT) approach to tackle the issue of a noise type mismatch between the training and testing conditions, and provides significant improvements in PESQ, SSNR, and STOI over the SE system without an adaptation.

SERIL: Noise Adaptive Speech Enhancement using Regularization-based Incremental Learning

A regularization-based incremental learning SE (SERIL) strategy, complementing existing noise adaptation strategies without using additional storage, that can effectively adjust itself to new noise environments while overcoming the catastrophic forgetting issue.

A Cross-Task Transfer Learning Approach to Adapting Deep Speech Enhancement Models to Unseen Background Noise Using Paired Senone Classifiers

An environment adaptation approach that improves deep speech enhancement models via minimizing the Kullback-Leibler divergence between posterior probabilities produced by a multi-condition senone classifier fed with noisy speech features to transfer an existing deep neural network (DNN) speech enhancer to specific noisy environments without using noisy/clean paired target waveforms needed in conventional DNN-based spectral regression.

A Study of Training Targets for Deep Neural Network-Based Speech Enhancement Using Noise Prediction

Object test results show that the mask-based targets are superior to the spectral magnitude target in the noise-prediction framework and that the best noise target outperforms the speech-predictions network in terms of objective quality and intelligibility metrics in seen noise conditions.

Speech enhancement based on deep denoising autoencoder

Experimental results show that adding depth of the DAE consistently increase the performance when a large training data set is given, and compared with a minimum mean square error based speech enhancement algorithm, the proposed denoising DAE provided superior performance on the three objective evaluations.

Adaptive Neural Speech Enhancement with a Denoising Variational Autoencoder

This paper presents a neural speech enhancement method that has a statistical feedback mechanism based on a denoising variational autoencoder (VAE) that outperforms the existing mask-based and generative enhancement methods in unknown conditions.

Adversarial training for data-driven speech enhancement without parallel corpus

Experimental results show that the speech enhancement approach achieves improved ASR performance compared with results obtained with unprocessed signals and achieves comparable AsR performance to that obtained with a model trained with a parallel corpus based on a minimum mean squared error (MMSE) criterion.