Scene-Agnostic Multi-Microphone Speech Dereverberation

  title={Scene-Agnostic Multi-Microphone Speech Dereverberation},
  author={Yochai Yemini and Ethan Fetaya and Haggai Maron and Sharon Gannot},
Neural networks (NNs) have been widely applied in speech processing tasks, and, in particular, those employing microphone arrays. Nevertheless, most existing NN architectures can only deal with fixed and position-specific microphone arrays. In this paper, we present an NN architecture that can cope with microphone arrays whose number and positions of the microphones are unknown, and demonstrate its applicability in the speech dereverberation task. To this end, our approach harnesses recent… 

Figures and Tables from this paper

One Model to Enhance Them All: Array Geometry Agnostic Multi-Channel Personalized Speech Enhancement

A new causal array-geometry-agnostic multi-channel PSE model is proposed, which can generate a high-quality enhanced signal from arbitrary microphone geometry and outperforms the model trained on a specific microphone array geometry in both speech quality and automatic speech recognition accuracy.

Controllable Multichannel Speech Dereverberation based on Deep Neural Networks

A novel deep neural network based multichannel speech dereverberation algorithm is proposed, in which the dereVerberation level is controllable, by adding a simple floating-point number as target controller of the model.



UNet++-Based Multi-Channel Speech Dereverberation and Distant Speech Recognition

This work proposes a novel approach of using a newly appeared fully convolutional network (FCN) architecture, UNet++, for multichannel speech dereverberation and distant speech recognition (DSR), and presents DSR results from the multiple distant microphone datasets of AMI meeting corpus.

Neural Speech Separation Using Spatially Distributed Microphones

Speech recognition experimental results show that the proposed neural network based speech separation method significantly outperforms baseline multi-channel speech separation systems.

Multi-Microphone Complex Spectral Mapping for Speech Dereverberation

  • Zhong-Qiu WangDeliang Wang
  • Physics
    ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2020
Experimental results on multi-channel speech dereverberation demonstrate the effectiveness of the proposed approach and the integration of multi-microphone complex spectral mapping with beamforming and post-filtering is investigated.

On Learning Sets of Symmetric Elements

This paper characterize the space of linear layers that are equivariant both to element reordering and to the inherent symmetries of elements, like translation in the case of images, and shows that networks that are composed of these layers are universal approximators of both invariant and Equivariant functions.

Multi-Microphone Speaker Separation based on Deep DOA Estimation

The proposed deep direction estimation for speech separation (DDESS) method is inspired by the recent advances in deep clustering methods and is closely associated with the spatial information, as manifested by the different speakers’ directions of arrival.

End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation

This paper proposes transform-average-concatenate (TAC), a simple design paradigm for channel permutation and number invariant multi-channel speech separation based on the filter-and-sum network, and shows how TAC significantly improves the separation performance across various numbers of microphones in noisy reverberant separation tasks with ad-hoc arrays.

gpuRIR: A python library for Room Impulse Response simulation with GPU acceleration

A new implementation of the Image Source Method is presented that dramatically improves the computation speed of the ISM by using Graphic Processing Units (GPUs) to parallelize both the simulation of multiple RIRs and the computation of the images inside each RIR.

Burst Image Deblurring Using Permutation Invariant Convolutional Neural Networks

The novel convolutional architecture has a simultaneous view of all frames in the burst, and by construction treats them in an order-independent manner to effectively detect and leverage subtle cues scattered across different frames, while ensuring that each frame gets a full and equal consideration regardless of its position in the sequence.

Multi-Speaker DOA Estimation Using Deep Convolutional Networks Trained With Noise Signals

The ability of the proposed convolutional neural network based supervised learning method for estimating the direction of arrival (DOA) of multiple speakers to adapt to unseen acoustic conditions and its robustness to unseen noise type is demonstrated.

Reverberant overlap- and self-masking in consonant identification.

The results for the natural and synthetic syllables indicated that the effect of reverberation on identification of consonants following/s/ was not comparable to masking by either the /s/ -spectrum-shaped noise or the babble.