A Time-domain Generalized Wiener Filter for Multi-channel Speech Separation
@article{Luo2021ATG, title={A Time-domain Generalized Wiener Filter for Multi-channel Speech Separation}, author={Yi Luo}, journal={ArXiv}, year={2021}, volume={abs/2112.03533} }
Frequency-domain neural beamformers are the mainstream methods for recent multi-channel speech separation models. Despite their well-defined behaviors and the effectiveness, such frequencydomain beamformers still have the limitations of a bounded oracle performance and the difficulties of designing proper networks for the complex-valued operations. In this paper, we propose a timedomain generalized Wiener filter (TD-GWF), an extension to the conventional frequency-domain beamformers that has…
References
SHOWING 1-10 OF 27 REFERENCES
Complex Spectral Mapping for Single- and Multi-Channel Speech Enhancement and Robust ASR
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2020
A novel method of time-varying beamforming with estimated complex spectra for single- and multi-channel speech enhancement, where deep neural networks are used to predict the real and imaginary components of the direct-path signal from noisy and reverberant ones.
End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation
- Computer ScienceICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020
This paper proposes transform-average-concatenate (TAC), a simple design paradigm for channel permutation and number invariant multi-channel speech separation based on the filter-and-sum network, and shows how TAC significantly improves the separation performance across various numbers of microphones in noisy reverberant separation tasks with ad-hoc arrays.
Beam-Guided TasNet: An Iterative Speech Separation Framework with Multi-Channel Output
- Computer Science
- 2021
A “multi-channel input, multi-channel multi-source output” (MIMMO) speech separation system entitled “Beam-Guided TasNet”, where MC-Conv-TasNet and MVDR can interact and promote each other more compactly under a directed cyclic graph.
Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation
- PhysicsIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2021
This study first investigates offline utterance-wise speaker separation and then extends to block-online continuous speech separation, and integrates multi-microphone complex spectral mapping with minimum variance distortionless response (MVDR) beamforming and post-filtering to further improve separation.
ADL-MVDR: All Deep Learning MVDR Beamformer for Target Speech Separation
- Computer ScienceICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2021
This paper proposes a novel all deep learning MVDR framework, where the matrix inversion and eigenvalue decomposition are replaced by two recurrent neural networks (RNNs), to resolve both issues at the same time.
A speech enhancement algorithm by iterating single- and multi-microphone processing and its application to robust ASR
- Computer Science, Physics2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2017
The core of the algorithm estimates a time-frequency mask which represents the target speech and use masking-based beamforming to enhance corrupted speech and propose a masked-based post-filter to further suppress the noise in the output of beamforming.
A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation
- PhysicsIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2017
This paper proposes to analyze a large number of established and recent techniques according to four transverse axes: 1) the acoustic impulse response model, 2) the spatial filter design criterion, 3) the parameter estimation algorithm, and 4) optional postfiltering.
Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2019
A fully convolutional time-domain audio separation network (Conv-TasNet), a deep learning framework for end-to-end time- domain speech separation, which significantly outperforms previous time–frequency masking methods in separating two- and three-speaker mixtures.
Complex Ratio Masking for Monaural Speech Separation
- PhysicsIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2016
The proposed approach improves over other methods when evaluated with several objective metrics, including the perceptual evaluation of speech quality (PESQ), and a listening test where subjects prefer the proposed approach with at least a 69% rate.
Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement
- Computer Science2021 IEEE Spoken Language Technology Workshop (SLT)
- 2021
This work introduces sequential neural beamforming, which alternates between neural network based spectral separation and beamforming based spatial separation, and introduces a multi-frame beamforming method which improves the results significantly by adding contextual frames to the beamforming formulations.