• Corpus ID: 239049687

TPARN: Triple-path Attentive Recurrent Network for Time-domain Multichannel Speech Enhancement

@article{Pandey2021TPARNTA,
  title={TPARN: Triple-path Attentive Recurrent Network for Time-domain Multichannel Speech Enhancement},
  author={Ashutosh Pandey and Buye Xu and Anurag Kumar and Jacob Donley and Paul T. Calamia and Deliang Wang},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.10757}
}
In this work, we propose a new model called triple-path attentive recurrent network (TPARN) for multichannel speech enhancement in the time domain. TPARN extends a single-channel dual-path network to a multichannel network by adding a third path along the spatial dimension. First, TPARN processes speech signals from all channels independently using a dual-path attentive recurrent network (ARN), which is a recurrent neural network (RNN) augmented with self-attention. Next, an ARN is introduced… 

Figures and Tables from this paper

Time-domain Ad-hoc Array Speech Enhancement Using a Triple-path Network
TLDR
This work proposes a novel triple-path network for ad-hoc array processing in the time domain that is a multiple-input multiple-output architecture that can simultaneously enhance signals at all microphones.
Multichannel Speech Enhancement without Beamforming
TLDR
This work proposes a two-stage strategy for multi-channel speech enhancement that does not need a beamformer for additional performance and proposes a novel attentive dense convolutional network (ADCN) for predicting real and imaginary parts of complex spectrogram.

References

SHOWING 1-10 OF 32 REFERENCES
A short-time objective intelligibility measure for time-frequency weighted noisy speech
TLDR
An objective intelligibility measure is presented, which shows high correlation (rho=0.95) with the intelligibility of both noisy, and TF-weighted noisy speech, and shows significantly better performance than three other, more sophisticated, objective measures.
Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs
TLDR
A new model has been developed for use across a wider range of network conditions, including analogue connections, codecs, packet loss and variable delay, known as perceptual evaluation of speech quality (PESQ).
Improved MVDR Beamforming Using Single-Channel Mask Prediction Networks
TLDR
It is shown that using a single mask across microphones for covariance prediction with minima-limited post-masking yields the best result in terms of signal-level quality measures and speech recognition word error rates in a mismatched training condition.
Multi-Microphone Complex Spectral Mapping for Speech Dereverberation
  • Zhong-Qiu Wang, Deliang Wang
  • Physics
    ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2020
TLDR
Experimental results on multi-channel speech dereverberation demonstrate the effectiveness of the proposed approach and the integration of multi-microphone complex spectral mapping with beamforming and post-filtering is investigated.
Self-Attending RNN for Speech Enhancement to Improve Cross-Corpus Generalization
TLDR
Experimental results demonstrate that ARN substantially outperforms competitive approaches to time-domain speech enhancement, such as RNNs and dual-path ARNs, and the two popular approaches to speech enhancement: complex spectral mapping and time- domain enhancement, obtain similar results for RNN and ARN with large-scale training.
Dense CNN With Self-Attention for Time-Domain Speech Enhancement
TLDR
Experimental results demonstrate that DCN trained with the proposed loss substantially outperforms other state-of-the-art approaches to causal and non-causal speech enhancement.
Online Self-Attentive Gated RNNs for Real-Time Speaker Separation
TLDR
This study converts a non-causal state-of-the-art separation model into a causal and real-time model and evaluates its performance under both online and offline settings, shedding light on the relative difference between causal and non-Causal models when performing separation.
SAGRNN: Self-Attentive Gated RNN For Binaural Speaker Separation With Interaural Cue Preservation
TLDR
This study extends a newly-developed gated recurrent neural network for monaural separation by additionally incorporating self-attention mechanisms and dense connectivity and develops an end-to-end multiple-input multiple-output system, which directly maps from the binaural waveform of the mixture to those of the speech signals.
Channel-Attention Dense U-Net for Multichannel Speech Enhancement
TLDR
This paper proposes Channel-Attention Dense U-Net, in which the channel-attention unit is applied recursively on feature maps at every layer of the network, enabling the network to perform non-linear beamforming.
Densely Connected Neural Network with Dilated Convolutions for Real-Time Speech Enhancement in The Time Domain
  • Ashutosh Pandey, Deliang Wang
  • Computer Science
    ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2020
TLDR
Experimental results show that the proposed model significantly outperforms other real-time state-of-the-art models in terms of objective intelligibility and quality scores.
...
1
2
3
4
...