DNN-Based Mask Estimation for Distributed Speech Enhancement in Spatially Unconstrained Microphone Arrays

@article{Furnon2021DNNBasedME,
  title={DNN-Based Mask Estimation for Distributed Speech Enhancement in Spatially Unconstrained Microphone Arrays},
  author={Nicolas Furnon and Romain Serizel and Slim Essid and Irina Illina},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
  year={2021},
  volume={29},
  pages={2310-2323}
}
Deep neural network (DNN)-based speech enhancement algorithms in microphone arrays have now proven to be efficient solutions to speech understanding and speech recognition in noisy environments. However, in the context of ad-hoc microphone arrays, many challenges remain and raise the need for distributed processing. In this paper, we propose to extend a previously introduced distributed DNN-based time-frequency mask estimation scheme that can efficiently use spatial information in form of so… 
Attention-based distributed speech enhancement for unconstrained microphone arrays with varying number of nodes
TLDR
This paper uses an attention mechanism in order to put more weight on the relevant signals sent throughout the array and to neglect the redundant or empty channels, which can efficiently process the spatial information captured by the different devices of the microphone array.
Time-domain Ad-hoc Array Speech Enhancement Using a Triple-path Network
TLDR
This work proposes a novel triple-path network for ad-hoc array processing in the time domain that is a multiple-input multiple-output architecture that can simultaneously enhance signals at all microphones.
Learning to Rank Microphones for Distant Speech Recognition
TLDR
This work proposes MicRank, a learning to rank framework where a neural network is trained to rank the available channels using directly the recognition performance on the training set, which is agnostic with respect to the array geometry and type of recognition back-end.
Distributed Speech Separation in Spatially Unconstrained Microphone Arrays
TLDR
This work proposes a distributed algorithm that can process spatial information in a spatially unconstrained microphone array that relies on a convolutional recurrent neural network that can exploit the signal diversity from the distributed nodes.
Research on Speech Enhancement Algorithm of Multiresolution Cochleagram Based on Skip Connection Deep Neural Network
TLDR
The noise reduction method adopts the Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator and takes I-MRCG as the input feature and Skip-DNN as the training network to improve the speech enhancement effect of the model.

References

SHOWING 1-10 OF 57 REFERENCES
DNN-based Distributed Multichannel Mask Estimation for Speech Enhancement in Microphone Arrays
TLDR
This work proposes to extend the distributed adaptive node-specific signal estimation approach to a neural network framework and shows that this additional signal can be leveraged to predict the masks and leads to better speech enhancement performance than when the mask estimation relies only on the local signals.
Improved MVDR Beamforming Using Single-Channel Mask Prediction Networks
TLDR
It is shown that using a single mask across microphones for covariance prediction with minima-limited post-masking yields the best result in terms of signal-level quality measures and speech recognition word error rates in a mismatched training condition.
Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition
TLDR
This paper introduces a neural network architecture, which performs multichannel filtering in the first layer of the network, and shows that this network learns to be robust to varying target speaker direction of arrival, performing as well as a model that is given oracle knowledge of the true target Speaker direction.
Neural network based spectral mask estimation for acoustic beamforming
TLDR
A neural network based approach to acoustic beamforming is presented, used to estimate spectral masks from which the Cross-Power Spectral Density matrices of speech and noise are estimated, which are used to compute the beamformer coefficients.
Multi-Microphone Neural Speech Separation for Far-Field Multi-Talker Speech Recognition
TLDR
A neural network approach to far-field speech separation using multiple microphones that can learn to implicitly figure out the number of speakers constituting an input speech mixture and significantly outperforms the single-microphone permutation invariant training framework.
Combining Deep Neural Networks and Beamforming for Real-Time Multi-Channel Speech Enhancement using a Wireless Acoustic Sensor Network
  • Enea Ceolini, Shih-Chii Liu
  • Computer Science
    2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP)
  • 2019
TLDR
This work presents a multi-channel speech enhancement algorithm using a neural network combined with beamforming deployed realtime on a wireless acoustic sensor network (WASN) of distributed microphones, and considers models with a small parameter count and low computational complexity.
DNN-based speech mask estimation for eigenvector beamforming
In this paper, we present an optimal multi-channel Wiener filter, which consists of an eigenvector beamformer and a single-channel postfilter. We show that both components solely depend on a speech
FaSNet: Low-Latency Adaptive Beamforming for Multi-Microphone Audio Processing
TLDR
Experiments show that despite its small model size, FaSNet is able to outperform several traditional oracle beamformers with respect to scale-invariant signal-to-noise ratio (SI-SNR) in reverberant speech enhancement and separation tasks.
A Multichannel MMSE-Based Framework for Speech Source Separation and Noise Reduction
TLDR
This framework starts by formulating the minimum-mean-square error (MMSE)-based solution in the context of multiple simultaneous speakers and background noise, and outlines the importance of the estimation of the activities of the speakers.
...
1
2
3
4
5
...