• Publications
  • Influence
Learning Environmental Sounds with Multi-scale Convolutional Neural Network
TLDR
A novel end-to-end network called WaveMsNet is proposed based on the multi-scale convolution operation and two-phase method, which can get better audio representation by improving the frequency resolution and learning filters cross all frequency area.
Mixup-Based Acoustic Scene Classification Using Multi-Channel Convolutional Neural Network
TLDR
This paper explores the use of Multi-channel CNN for the classification task, which aims to extract features from different channels in an end-to-end manner, and explores the using of mixup method, which can provide higher prediction accuracy and robustness in contrast with previous models.
Environmental Sound Classification Based on Multi-temporal Resolution Convolutional Neural Network Combining with Multi-level Features
TLDR
Results demonstrate that the proposed method is highly effective in the classification tasks by employing multi-temporal resolution and multi-level features, and it outperforms the previous methods which only account for single- level features.
Predicting ultrasound tongue image from lip images using sequence to sequence learning.
TLDR
Experimental results show that the machine learning model can predict the tongue's motion with satisfactory performance, which demonstrates that the learned neural network can build the association between two imaging modalities.
General audio tagging with ensembling convolutional neural network and statistical features
TLDR
An ensemble learning framework is applied to ensemble statistical features and the outputs from the deep classifiers, with the goal to utilize complementary information to address the noisy label problem within the framework.
Meta learning based audio tagging
TLDR
This paper describes the solution for the general-purpose audio tagging task, which belongs to one of the subtasks in the DCASE 2018 challenge, and proposes a meta learning-based ensemble method that can provide higher prediction accuracy and robustness with comparison to the single model.
An Adversarial Feature Distillation Method for Audio Classification
TLDR
A distillation method is proposed which transfers knowledge from well-trained networks to a small network, and the method can compress model size while improving audio classification precision and demonstrate that the small network can provide better performance.
Weakly supervised CRNN system for sound event detection with large-scale unlabeled in-domain data
TLDR
A state-of-the-art general audio tagging model is first employed to predict weak labels for unlabeled data, and a weakly supervised architecture based on the convolutional recurrent neural network is developed to solve the strong annotations of sound events with the aid of the unlabeling data with predicted labels.
Audio Tagging by Cross Filtering Noisy Labels
TLDR
This article presents a novel framework, named CrossFilter, to combat the noisy labels problem for audio tagging, and achieves state-of-the-art performance and even surpasses the ensemble models on FSDKaggle2018 dataset.
Environmental Sound Classification Based on Multi-temporal Resolution CNN Network Combining with Multi-level Features
TLDR
Results demonstrate that the proposed method is highly effective in the classification tasks by employing multi-temporal resolution and multi-level features, and it outperforms the previous methods which only account for single- level features.
...
...