Dereverberation of Autoregressive Envelopes for Far-field Speech Recognition

  title={Dereverberation of Autoregressive Envelopes for Far-field Speech Recognition},
  author={Anurenjan Purushothaman and Anirudh Sreeram and Rohit Kumar and Sriram Ganapathy},

Figures and Tables from this paper

End-To-End Speech Recognition with Joint Dereverberation of Sub-Band Autoregressive Envelopes

The envelope dereverberation, feature extraction and acoustic modeling using transformer based E2E ASR can all be jointly optimized for the speech recognition task.



Deep Learning Based Dereverberation of Temporal Envelopesfor Robust Speech Recognition

The proposed neural enhancement model performs an envelop gain based enhancement of temporal envelopes and it consists of a series of convolutional and recurrent neural network layers that are used to generate features for automatic speech recognition (ASR).

Far-Field Speech Recognition Using Multivariate Autoregressive Models

This paper proposes a novel method of speech feature extraction using multivariate AR modeling (MAR) of temporal envelopes, and performs several speech recognition experiments in the REVERB Challenge database for single and multi-microphone settings.

Multivariate Autoregressive Spectrogram Modeling for Noisy Speech Recognition

  • S. Ganapathy
  • Computer Science
    IEEE Signal Processing Letters
  • 2017
The subband discrete cosine transform coefficients obtained from multiple speech bands are used in the MAR framework to derive the Riesz temporal envelopes that provide features for ASR, providing significant improvements over other noise robust feature extraction methods.

Recognition of Reverberant Speech Using Frequency Domain Linear Prediction

This work presents a feature extraction technique based on modeling temporal envelopes of the speech signal in narrow subbands using frequency domain linear prediction (FDLP), which provides an all-pole approximation of the Hilbert envelope of the signal obtained by linear prediction on cosine transform of the signals.

Speech Dereverberation With Context-Aware Recurrent Neural Networks

  • J. F. SantosT. Falk
  • Computer Science, Business
    IEEE/ACM Transactions on Audio, Speech, and Language Processing
  • 2018
The proposed model to perform speech dereverberation by estimating its spectral magnitude from the reverberant counterpart outperforms a recently proposed model that uses different context information depending on the reverberation time, without requiring any sort of additional input.

Speech Dereverberation Based on Variance-Normalized Delayed Linear Prediction

NDLP can robustly estimate an inverse system for late reverberation in the presence of noise without greatly distorting a direct speech signal and can be implemented in a computationally efficient manner in the time-frequency domain.

3-D Acoustic Modeling for Far-Field Multi-Channel Speech Recognition

The proposed 3-D feature and acoustic modeling approach provides significant improvements over an ASR system trained with beamformed audio (average relative improvements of 16% and 6% in word error rates for CHiME-3 and REVERB Challenge datasets respectively).

Learning Spectral Mapping for Speech Dereverberation and Denoising

Deep neural networks are trained to directly learn a spectral mapping from the magnitude spectrogram of corrupted speech to that of clean speech, which substantially attenuates the distortion caused by reverberation, as well as background noise, and is conceptually simple.

3-D CNN Models for Far-Field Multi-Channel Speech Recognition

A three dimensional (3-D) convolutional neural network (CNN) architecture for multi-channel far-field ASR, which processes time, frequency & channel dimensions of the input spectrogram to learn representations using Convolutional layers.

An End-to-End Deep Learning Approach to Simultaneous Speech Dereverberation and Acoustic Modeling for Robust Speech Recognition

An integrated end-to-end automatic speech recognition (ASR) paradigm by joint learning of the front-end speech signal processing and back-end acoustic modeling is proposed, leading to a unified deep neural network (DNN) framework for distant speech processing that can achieve both high-quality enhanced speech and high-accuracy ASR simultaneously.