• Corpus ID: 201682582

THUEE SYSTEM FOR DCASE 2019 CHALLENGE TASK 2 Technical Report

@inproceedings{He2019THUEESF,
  title={THUEE SYSTEM FOR DCASE 2019 CHALLENGE TASK 2 Technical Report},
  author={Ke-Xin He and Yu-Han Shen and Weiqiang Zhang},
  year={2019}
}
In this report, we described our submission for the task 2 of Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 Challenge: Audio tagging with noisy labels and minimal supervision. Our methods are mainly based on two types of deep learning models: Convolutional Recurrent Neural Network (CRNN) and DenseNet. In order to prevent overfitting, we adopted data augmentation using mixup strategy and SpecAugment. Besides, we designed a staged loss function to train our models using… 

Figures and Tables from this paper

Comparison of Artificial Neural Network Types for Infant Vocalization Classification
TLDR
A unified neural network architecture scheme for audio classification is defined from which various network types are derived and the most influential architectural hyperparameter for all types were the integration operations for reducing tensor dimensionality between network stages.

References

SHOWING 1-9 OF 9 REFERENCES
STACKED CONVOLUTIONAL NEURAL NETWORKS FOR GENERAL-PURPOSE AUDIO TAGGING Technical Report
TLDR
A number of neural network architectures that learn from log-mel spectrogram inputs are proposed that involve the use of preprocessing techniques, data augmentation, loss function weighting, and pseudo-labeling in order to improve their performance.
Audio tagging with noisy labels and minimal supervision
TLDR
This paper presents the task setup, the FSDKaggle2019 dataset prepared for this scientific evaluation, and a baseline system consisting of a convolutional neural network.
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
TLDR
This work presents SpecAugment, a simple data augmentation method for speech recognition that is applied directly to the feature inputs of a neural network (i.e., filter bank coefficients) and achieves state-of-the-art performance on the LibriSpeech 960h and Swichboard 300h tasks, outperforming all prior work.
Squeeze-and-Excitation Networks
  • Jie Hu, Li Shen, Gang Sun
  • Computer Science
    2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
  • 2018
TLDR
This work proposes a novel architectural unit, which is term the "Squeeze-and-Excitation" (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels and finds that SE blocks produce significant performance improvements for existing state-of-the-art deep architectures at minimal additional computational cost.
mixup: Beyond Empirical Risk Minimization
TLDR
This work proposes mixup, a simple learning principle that trains a neural network on convex combinations of pairs of examples and their labels, which improves the generalization of state-of-the-art neural network architectures.
Adam: A Method for Stochastic Optimization
TLDR
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Freesound Datasets: A Platform for the Creation of Open Audio Datasets
Comunicacio presentada al 18th International Society for Music Information Retrieval Conference celebrada a Suzhou, Xina, del 23 al 27 d'cotubre de 2017.
librosa: Audio and Music Signal Analysis in Python
TLDR
A brief overview of the librosa library's functionality is provided, along with explanations of the design goals, software development practices, and notational conventions.
Audio tagging system for DCASE 2018: focusing on label noise, data augmentation and its efficient learning
  • DCASE2018 Challenge, Tech. Rep., 2018.
  • 2018