• Corpus ID: 201690821

Detection and Classification of Acoustic Scenes and Events 2019 Challenge MULTI-LABEL AUDIO TAGGING WITH NOISY LABELS AND VARIABLE LENGTH Technical Report

  title={Detection and Classification of Acoustic Scenes and Events 2019 Challenge MULTI-LABEL AUDIO TAGGING WITH NOISY LABELS AND VARIABLE LENGTH Technical Report},
  author={Boqing Zhu and Kele Xu and Dezhi Wang and Mathurin Ach{\'e}},
This paper describes our approach for DCASE 2019 Task2: Audio tagging with noisy labels and minimal supervision. This challenge uses a smaller set of manually labeled data and a larger set of noiselabeled data to enable the system to perform multi-label audio tagging tasks with minimal supervision conditions. We aim to tagging the audio clips with convolutional neural network under a limited computation and storage resources. To tackle the problem of noisy label data, we propose a data… 

Figures and Tables from this paper


Audio tagging with noisy labels and minimal supervision
This paper presents the task setup, the FSDKaggle2019 dataset prepared for this scientific evaluation, and a baseline system consisting of a convolutional neural network.
Learning Sound Event Classifiers from Web Audio with Noisy Labels
Experiments suggest that training with large amounts of noisy data can outperform training with smaller amounts of carefully-labeled data, and it is shown that noise-robust loss functions can be effective in improving performance in presence of corrupted labels.
Mixup-Based Acoustic Scene Classification Using Multi-Channel Convolutional Neural Network
This paper explores the use of Multi-channel CNN for the classification task, which aims to extract features from different channels in an end-to-end manner, and explores the using of mixup method, which can provide higher prediction accuracy and robustness in contrast with previous models.
Learning Environmental Sounds with Multi-scale Convolutional Neural Network
A novel end-to-end network called WaveMsNet is proposed based on the multi-scale convolution operation and two-phase method, which can get better audio representation by improving the frequency resolution and learning filters cross all frequency area.
Hierachical learning for DNN-based acoustic scene classification
Two hierarchical learning methods are proposed to improve the DNN baseline performance by incorporating the hierarchical taxonomy information of environmental sounds in a deep neural network (DNN)-based acoustic scene classification framework.
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
This work presents SpecAugment, a simple data augmentation method for speech recognition that is applied directly to the feature inputs of a neural network (i.e., filter bank coefficients) and achieves state-of-the-art performance on the LibriSpeech 960h and Swichboard 300h tasks, outperforming all prior work.
Environmental sound classification with convolutional neural networks
  • Karol J. Piczak
  • Computer Science
    2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP)
  • 2015
The model outperforms baseline implementations relying on mel-frequency cepstral coefficients and achieves results comparable to other state-of-the-art approaches.
MobileNetV2: Inverted Residuals and Linear Bottlenecks
A new mobile architecture, MobileNetV2, is described that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes and allows decoupling of the input/output domains from the expressiveness of the transformation.
Manifold Mixup: Encouraging Meaningful On-Manifold Interpolation as a Regularizer
This work proposes Manifold Mixup which encourages the network to produce more reasonable and less confident predictions at points with combinations of attributes not seen in the training set by training on convex combinations of the hidden state representations of data samples.