• Corpus ID: 244920849

A Time-domain Generalized Wiener Filter for Multi-channel Speech Separation

  title={A Time-domain Generalized Wiener Filter for Multi-channel Speech Separation},
  author={Yi Luo},
  • Yi Luo
  • Published 7 December 2021
  • Engineering
  • ArXiv
Frequency-domain neural beamformers are the mainstream methods for recent multi-channel speech separation models. Despite their well-defined behaviors and the effectiveness, such frequencydomain beamformers still have the limitations of a bounded oracle performance and the difficulties of designing proper networks for the complex-valued operations. In this paper, we propose a timedomain generalized Wiener filter (TD-GWF), an extension to the conventional frequency-domain beamformers that has… 

Figures and Tables from this paper


Complex Spectral Mapping for Single- and Multi-Channel Speech Enhancement and Robust ASR
A novel method of time-varying beamforming with estimated complex spectra for single- and multi-channel speech enhancement, where deep neural networks are used to predict the real and imaginary components of the direct-path signal from noisy and reverberant ones.
End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation
This paper proposes transform-average-concatenate (TAC), a simple design paradigm for channel permutation and number invariant multi-channel speech separation based on the filter-and-sum network, and shows how TAC significantly improves the separation performance across various numbers of microphones in noisy reverberant separation tasks with ad-hoc arrays.
Beam-Guided TasNet: An Iterative Speech Separation Framework with Multi-Channel Output
A “multi-channel input, multi-channel multi-source output” (MIMMO) speech separation system entitled “Beam-Guided TasNet”, where MC-Conv-TasNet and MVDR can interact and promote each other more compactly under a directed cyclic graph.
Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation
This study first investigates offline utterance-wise speaker separation and then extends to block-online continuous speech separation, and integrates multi-microphone complex spectral mapping with minimum variance distortionless response (MVDR) beamforming and post-filtering to further improve separation.
ADL-MVDR: All Deep Learning MVDR Beamformer for Target Speech Separation
This paper proposes a novel all deep learning MVDR framework, where the matrix inversion and eigenvalue decomposition are replaced by two recurrent neural networks (RNNs), to resolve both issues at the same time.
A speech enhancement algorithm by iterating single- and multi-microphone processing and its application to robust ASR
The core of the algorithm estimates a time-frequency mask which represents the target speech and use masking-based beamforming to enhance corrupted speech and propose a masked-based post-filter to further suppress the noise in the output of beamforming.
A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation
This paper proposes to analyze a large number of established and recent techniques according to four transverse axes: 1) the acoustic impulse response model, 2) the spatial filter design criterion, 3) the parameter estimation algorithm, and 4) optional postfiltering.
Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation
  • Yi Luo, N. Mesgarani
  • Computer Science
    IEEE/ACM Transactions on Audio, Speech, and Language Processing
  • 2019
A fully convolutional time-domain audio separation network (Conv-TasNet), a deep learning framework for end-to-end time- domain speech separation, which significantly outperforms previous time–frequency masking methods in separating two- and three-speaker mixtures.
Complex Ratio Masking for Monaural Speech Separation
The proposed approach improves over other methods when evaluated with several objective metrics, including the perceptual evaluation of speech quality (PESQ), and a listening test where subjects prefer the proposed approach with at least a 69% rate.
Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement
This work introduces sequential neural beamforming, which alternates between neural network based spectral separation and beamforming based spatial separation, and introduces a multi-frame beamforming method which improves the results significantly by adding contextual frames to the beamforming formulations.