• Corpus ID: 239049687

TPARN: Triple-path Attentive Recurrent Network for Time-domain Multichannel Speech Enhancement

  title={TPARN: Triple-path Attentive Recurrent Network for Time-domain Multichannel Speech Enhancement},
  author={Ashutosh Pandey and Buye Xu and Anurag Kumar and Jacob Donley and Paul T. Calamia and Deliang Wang},
In this work, we propose a new model called triple-path attentive recurrent network (TPARN) for multichannel speech enhancement in the time domain. TPARN extends a single-channel dual-path network to a multichannel network by adding a third path along the spatial dimension. First, TPARN processes speech signals from all channels independently using a dual-path attentive recurrent network (ARN), which is a recurrent neural network (RNN) augmented with self-attention. Next, an ARN is introduced… 

Figures and Tables from this paper


Dual-Path RNN: Efficient Long Sequence Modeling for Time-Domain Single-Channel Speech Separation
  • Yi Luo, Zhuo Chen, T. Yoshioka
  • Computer Science, Engineering
    ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2020
Experiments show that by replacing 1-D CNN with DPRNN and apply sample-level modeling in the time-domain audio separation network (TasNet), a new state-of-the-art performance on WSJ0-2mix is achieved with a 20 times smaller model than the previous best system.
Real-time Single-channel Dereverberation and Separation with Time-domain Audio Separation Network
By formulating the dereverberation problem as a Denoising problem where the direct path is separated from the reverberations, a TasNet denoising autoencoder can outperform a deep LSTM baseline on log-power magnitude spectrogram input in both causal and non-causal settings.
Channel-Attention Dense U-Net for Multichannel Speech Enhancement
This paper proposes Channel-Attention Dense U-Net, in which the channel-attention unit is applied recursively on feature maps at every layer of the network, enabling the network to perform non-linear beamforming.
Densely Connected Neural Network with Dilated Convolutions for Real-Time Speech Enhancement in The Time Domain
  • Ashutosh Pandey, Deliang Wang
  • Computer Science
    ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2020
Experimental results show that the proposed model significantly outperforms other real-time state-of-the-art models in terms of objective intelligibility and quality scores.
Multichannel Speech Enhancement by Raw Waveform-Mapping Using Fully Convolutional Networks
The experimental results confirm the outstanding denoising capability of the proposed SE systems on the three tasks and the benefits of using the residual architecture on the overall SE performance.
Multi-Channel Speech Enhancement Using Time-Domain Convolutional Denoising Autoencoder
A multi-channel TCDAEs are evaluated on multi-Channel speech enhancement experiments, yielding significant improvement over single-channel DAEs in terms of signal-to-distortion ratio, perceptual evaluation of speech quality (PESQ), and word error rate.
Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation
  • Yi Luo, N. Mesgarani
  • Computer Science, Medicine
    IEEE/ACM Transactions on Audio, Speech, and Language Processing
  • 2019
A fully convolutional time-domain audio separation network (Conv-TasNet), a deep learning framework for end-to-end time- domain speech separation, which significantly outperforms previous time–frequency masking methods in separating two- and three-speaker mixtures.
End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation
This paper proposes transform-average-concatenate (TAC), a simple design paradigm for channel permutation and number invariant multi-channel speech separation based on the filter-and-sum network, and shows how TAC significantly improves the separation performance across various numbers of microphones in noisy reverberant separation tasks with ad-hoc arrays.
Multi-Channel Deep Clustering: Discriminative Spectral and Spatial Embeddings for Speaker-Independent Speech Separation
It is found that simply encoding inter-microphone phase patterns as additional input features during deep clustering provides a significant improvement in separation performance, even with random microphone array geometry.
A New Framework for Supervised Speech Enhancement in the Time Domain
A new learning framework that uses a loss function in the frequency domain to train a convolutional neural network (CNN) in the time domain, which substantially outperforms the other methods of speech enhancement.