• Publications
  • Influence
A Regression Approach to Speech Enhancement Based on Deep Neural Networks
TLDR
The proposed DNN approach can well suppress highly nonstationary noise, which is tough to handle in general, and is effective in dealing with noisy speech data recorded in real-world scenarios without the generation of the annoying musical artifact commonly observed in conventional enhancement methods.
An Experimental Study on Speech Enhancement Based on Deep Neural Networks
TLDR
This letter presents a regression-based speech enhancement framework using deep neural networks (DNNs) with a multiple-layer deep architecture that tends to achieve significant improvements in terms of various objective quality measures.
Large-Scale Weakly Supervised Audio Classification Using Gated Convolutional Neural Network
In this paper, we present a gated convolutional neural network and a temporal attention-based localization method for audio classification, which won the 1st place in the large-scale weakly
Dynamic noise aware training for speech enhancement based on deep neural networks
TLDR
Three algorithms to address the mismatch problem in deep neural network (DNN) based speech enhancement are proposed and can well suppress highly non-stationary noise better than all the competing state-of-the-art techniques.
Cross-task learning for audio tagging, sound event detection and spatial localization: DCASE 2019 baseline systems
TLDR
This paper proposes generic cross-task baseline systems based on convolutional neural networks (CNNs) and finds that the 9-layer CNN with average pooling is a good model for a majority of the DCASE 2019 tasks.
Audio Set Classification with Attention Model: A Probabilistic Perspective
This paper investigates the Audio Set classification. Audio Set is a large scale weakly labelled dataset (WLD) of audio clips. In WLD only the presence of a label is known, without knowing the
Robust speech recognition with speech enhanced deep neural networks
TLDR
A signal pre-processing front-end to enhance speech based on deep neural networks and use the enhanced speech features directly to train hidden Markov models (HMMs) for robust speech recognition and test the framework consistently outperform the state-of-the-art speech recognition systems in all evaluation conditions.
Sound Event Detection and Time–Frequency Segmentation from Weakly Labelled Data
TLDR
A time–frequency (T–F) segmentation framework trained on weakly labelled data to tackle the sound event detection and separation problem is proposed and predicted onset and offset times can be obtained from the T–F segmentation masks.
Weakly Labelled AudioSet Tagging With Attention Neural Networks
TLDR
This work bridges the connection between attention neural networks and multiple instance learning (MIL) methods, and proposes decision-level and feature-level attention neural Networks for audio tagging, which achieves a state-of-the-art mean average precision.
End-to-End Multi-Channel Speech Separation
TLDR
This paper proposes a new end-to-end model for multi-channel speech separation that reformulate the traditional short time Fourier transform and inter-channel phase difference as a function of time-domain convolution with a special kernel.
...
...