Blind Signal Dereverberation for Machine Speech Recognition

  title={Blind Signal Dereverberation for Machine Speech Recognition},
  author={Samik Sadhu and Hynek Hermansky},
We present a method to remove unknown convolutive noise introduced to speech by reverberations of recording environ-ments, utilizing some amount of training speech data from the reverberant environment, and any available non-reverberant speech data. Using Fourier transform computed over long temporal windows, which ideally cover the entire room impulse response, we convert room induced convolution to additions in the log spectral domain. Next, we compute a spectral normalization vector from… 

Figures and Tables from this paper



Radically Old Way of Computing Spectra: Applications in End-to-End ASR

A technique to compute spectrograms using Frequency Domain Linear Prediction (FDLP) that uses all-pole models to fit the squared Hilbert envelope of speech in different frequency sub-bands to show up to 25% and 22% relative WER improvements over mel spectrogram respectively.

Complex Frequency Domain Linear Prediction: A Tool to Compute Modulation Spectrum of Speech

This paper proposes a modification of the conventional FDLP model that allows easy interpretability of the complex cepstrum as temporal modulations in an all-pole model approximation of the power of the speech signal.

Speech Dereverberation and Denoising Based on Time Varying Speech Model and Autoregressive Reverberation Model

This chapter reviews a model-based dereverberation method developed by the authors that is effectively combined with a traditional denoising technique, specifically a multichannel Wiener filter.

Generalization of Multi-Channel Linear Prediction Methods for Blind MIMO Impulse Response Shortening

This paper generalizes existing dereverberation methods using subband-domain multi-channel linear prediction filters so that the resultant generalized algorithm can blindly shorten a multiple-input multiple-output room impulse response between a set of unknown number of sources and a microphone array.

Blind deconvolution through digital signal processing

This paper addresses the problem of deconvolving two signals when both are unknown. The authors call this problem blind deconvolution. The discussion develops two related solutions which can be

Echo removal by discrete generalized linear filtering.

It is shown that homomorphic deconvolution is a useful approach to either removal or detection of echoes in signal-analysis and signal-processing problems such as speech analysis and echo removal and detection.

Librispeech: An ASR corpus based on public domain audio books

It is shown that acoustic models trained on LibriSpeech give lower error rate on the Wall Street Journal (WSJ) test sets than models training on WSJ itself.

ESPnet: End-to-End Speech Processing Toolkit

A major architecture of this software platform, several important functionalities, which differentiate ESPnet from other open source ASR toolkits, and experimental results with major ASR benchmarks are explained.

Nonlinear filtering of multiplied and convolved signals

An approach to some nonlinear filtering problems through a generalized notion of superposition has proven useful. In this paper this approach is investigated for the nonlinear filtering of signals