DNN-Based Mask Estimation for Distributed Speech Enhancement in Spatially Unconstrained Microphone Arrays
@article{Furnon2021DNNBasedME, title={DNN-Based Mask Estimation for Distributed Speech Enhancement in Spatially Unconstrained Microphone Arrays}, author={Nicolas Furnon and Romain Serizel and Slim Essid and Irina Illina}, journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, year={2021}, volume={29}, pages={2310-2323} }
Deep neural network (DNN)-based speech enhancement algorithms in microphone arrays have now proven to be efficient solutions to speech understanding and speech recognition in noisy environments. However, in the context of ad-hoc microphone arrays, many challenges remain and raise the need for distributed processing. In this paper, we propose to extend a previously introduced distributed DNN-based time-frequency mask estimation scheme that can efficiently use spatial information in form of so…
Figures and Tables from this paper
5 Citations
Attention-based distributed speech enhancement for unconstrained microphone arrays with varying number of nodes
- Physics2021 29th European Signal Processing Conference (EUSIPCO)
- 2021
This paper uses an attention mechanism in order to put more weight on the relevant signals sent throughout the array and to neglect the redundant or empty channels, which can efficiently process the spatial information captured by the different devices of the microphone array.
Time-domain Ad-hoc Array Speech Enhancement Using a Triple-path Network
- Computer Science
- 2021
This work proposes a novel triple-path network for ad-hoc array processing in the time domain that is a multiple-input multiple-output architecture that can simultaneously enhance signals at all microphones.
Learning to Rank Microphones for Distant Speech Recognition
- Computer ScienceInterspeech
- 2021
This work proposes MicRank, a learning to rank framework where a neural network is trained to rank the available channels using directly the recognition performance on the training set, which is agnostic with respect to the array geometry and type of recognition back-end.
Distributed Speech Separation in Spatially Unconstrained Microphone Arrays
- Computer ScienceICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2021
This work proposes a distributed algorithm that can process spatial information in a spatially unconstrained microphone array that relies on a convolutional recurrent neural network that can exploit the signal diversity from the distributed nodes.
Research on Speech Enhancement Algorithm of Multiresolution Cochleagram Based on Skip Connection Deep Neural Network
- Computer ScienceJournal of Sensors
- 2022
The noise reduction method adopts the Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator and takes I-MRCG as the input feature and Skip-DNN as the training network to improve the speech enhancement effect of the model.
References
SHOWING 1-10 OF 57 REFERENCES
DNN-based Distributed Multichannel Mask Estimation for Speech Enhancement in Microphone Arrays
- Computer ScienceICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020
This work proposes to extend the distributed adaptive node-specific signal estimation approach to a neural network framework and shows that this additional signal can be leveraged to predict the masks and leads to better speech enhancement performance than when the mask estimation relies only on the local signals.
Improved MVDR Beamforming Using Single-Channel Mask Prediction Networks
- PhysicsINTERSPEECH
- 2016
It is shown that using a single mask across microphones for covariance prediction with minima-limited post-masking yields the best result in terms of signal-level quality measures and speech recognition word error rates in a mismatched training condition.
Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2017
This paper introduces a neural network architecture, which performs multichannel filtering in the first layer of the network, and shows that this network learns to be robust to varying target speaker direction of arrival, performing as well as a model that is given oracle knowledge of the true target Speaker direction.
Neural network based spectral mask estimation for acoustic beamforming
- Computer Science2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2016
A neural network based approach to acoustic beamforming is presented, used to estimate spectral masks from which the Cross-Power Spectral Density matrices of speech and noise are estimated, which are used to compute the beamformer coefficients.
Multi-Microphone Neural Speech Separation for Far-Field Multi-Talker Speech Recognition
- Physics, Computer Science2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2018
A neural network approach to far-field speech separation using multiple microphones that can learn to implicitly figure out the number of speakers constituting an input speech mixture and significantly outperforms the single-microphone permutation invariant training framework.
Combining Deep Neural Networks and Beamforming for Real-Time Multi-Channel Speech Enhancement using a Wireless Acoustic Sensor Network
- Computer Science2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP)
- 2019
This work presents a multi-channel speech enhancement algorithm using a neural network combined with beamforming deployed realtime on a wireless acoustic sensor network (WASN) of distributed microphones, and considers models with a small parameter count and low computational complexity.
Optimal distributed minimum-variance beamforming approaches for speech enhancement in wireless acoustic sensor networks
- Computer ScienceSignal Process.
- 2015
DNN-based speech mask estimation for eigenvector beamforming
- Computer Science2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2017
In this paper, we present an optimal multi-channel Wiener filter, which consists of an eigenvector beamformer and a single-channel postfilter. We show that both components solely depend on a speech…
FaSNet: Low-Latency Adaptive Beamforming for Multi-Microphone Audio Processing
- Computer Science2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
- 2019
Experiments show that despite its small model size, FaSNet is able to outperform several traditional oracle beamformers with respect to scale-invariant signal-to-noise ratio (SI-SNR) in reverberant speech enhancement and separation tasks.
A Multichannel MMSE-Based Framework for Speech Source Separation and Noise Reduction
- Computer ScienceIEEE Transactions on Audio, Speech, and Language Processing
- 2013
This framework starts by formulating the minimum-mean-square error (MMSE)-based solution in the context of multiple simultaneous speakers and background noise, and outlines the importance of the estimation of the activities of the speakers.