• Corpus ID: 243847541

Inter-channel Conv-TasNet for multichannel speech enhancement

  title={Inter-channel Conv-TasNet for multichannel speech enhancement},
  author={Dongheon Lee and Seon-il Kim and Jung-Woo Choi},
 Abstract—Speech enhancement in multichannel settings has been realized by utilizing the spatial information embedded in multiple microphone signals. Moreover, deep neural networks (DNNs) have been recently advanced in this field; however, studies on the efficient multichannel network structure fully exploiting spatial information and inter-channel relationships is still in its early stages. In this study, we propose an end-to-end time-domain speech enhancement network that can facilitate the… 

Figures and Tables from this paper

Insights Into Deep Non-Linear Filters for Improved Multi-Channel Speech Enhancement

This work analyses the properties of a non-linear spatial filter realized by a DNN as well as its interdependency with temporal and spectral processing by carefully controlling the information sources available to the network and reveals that in particular spectral information should be processed jointly with spatial information as this increases the spatial selectivity of the filter.

Multi-stage music separation network with dual-branch attention and hybrid convolution

Experimental results show that the proposed network achieves outstanding performance on the MIR-1K dataset only with fewer parameters, and competitive performance compared with state-of-the-arts on DSD100 and MUSDB18 datasets.

Multi-Channel Masking with Learnable Filterbank for Sound Source Separation

The experimental results show the method outperforms single-channel masking with a learnable filterbank and can outperform multi-channel complex masksing with STFT complex spectrum in the STGCSEN model if a learnables filterbank is transformed to a higher feature dimension.

Real-time Audio Video Enhancement \\with a Microphone Array and Headphones

This paper presents a complete hardware and software pipeline for real-time speech enhancement in noisy and reverberant conditions. The device consists of a microphone array and a camera mounted on

Enhancing End-to-End Multi-Channel Speech Separation Via Spatial Feature Learning

  • Rongzhi GuShi-Xiong Zhang Dong Yu
  • Computer Science
    ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2020
This work proposes an integrated architecture for learning spatial features directly from the multi-channel speech waveforms within an end-to-end speech separation framework using a 2d convolution layer and designs a conv2d kernel to compute the inter-channel convolution differences (ICDs), which are expected to provide the spatial cues that help to distinguish the directional sources.

Channel-Attention Dense U-Net for Multichannel Speech Enhancement

This paper proposes Channel-Attention Dense U-Net, in which the channel-attention unit is applied recursively on feature maps at every layer of the network, enabling the network to perform non-linear beamforming.

Exploring the Best Loss Function for DNN-Based Low-latency Speech Enhancement with Temporal Convolutional Networks

A STFT-based method and a loss function using problem-agnostic speech encoder features to improve subjective quality for the smaller dataset and achieves excellent performance on the DNS Challenge dataset are proposed.

TaSNet: Time-Domain Audio Separation Network for Real-Time, Single-Channel Speech Separation

  • Yi LuoN. Mesgarani
  • Computer Science
    2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2018
Time-domain Audio Separation Network (TasNet) is proposed, which outperforms the current state-of-the-art causal and noncausal speech separation algorithms, reduces the computational cost of speech separation, and significantly reduces the minimum required latency of the output.

Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation

  • Yi LuoN. Mesgarani
  • Computer Science
    IEEE/ACM Transactions on Audio, Speech, and Language Processing
  • 2019
A fully convolutional time-domain audio separation network (Conv-TasNet), a deep learning framework for end-to-end time- domain speech separation, which significantly outperforms previous time–frequency masking methods in separating two- and three-speaker mixtures.

End-to-End Multi-Channel Speech Separation

This paper proposes a new end-to-end model for multi-channel speech separation that reformulate the traditional short time Fourier transform and inter-channel phase difference as a function of time-domain convolution with a special kernel.

Raw waveform-based speech enhancement by fully convolutional networks

The proposed fully convolutional network (FCN) model can not only effectively recover the waveforms but also outperform the LPS- based DNN baseline in terms of short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ).

Unsupervised Speech Enhancement Based on Multichannel NMF-Informed Beamforming for Noise-Robust Automatic Speech Recognition

Online MVDR beamforming is proposed by effectively initializing and incrementally updating the parameters of MNMF by using multichannel nonnegative matrix factorization (MNMF), which outperformed the state-of-the-art DNN-based beamforming method in unknown environments that did not match training data.

Single channel speech enhancement using convolutional neural network

  • Tomás KounovskýJ. Málek
  • Computer Science
    2017 IEEE International Workshop of Electronics, Control, Measurement, Signals and their Application to Mechatronics (ECMSM)
  • 2017
The experiments indicate that mapping-based convolutional networks estimating log-power spectra achieve significant improvement over all competing topologies and target types and the ability of DAEs to enhance speech of unseen language based on the language diversity of the training set is investigated.

Convolutional Neural Networks to Enhance Coded Speech

Two postprocessing approaches applying convolutional neural networks either in the time domain or the cepstral domain to enhance the coded speech without any modification of the codecs are proposed.