• Corpus ID: 245426620

DCASE 2018 task 2: iterative training, label smoothing, and background noise normalization for audio event tagging

@inproceedings{Nguyen2018DCASE2T,
  title={DCASE 2018 task 2: iterative training, label smoothing, and background noise normalization for audio event tagging},
  author={Thi Ngoc Tho Nguyen and Ngoc Khanh Nguyen and Douglas L. Jones and W. S. Gan},
  booktitle={Workshop on Detection and Classification of Acoustic Scenes and Events},
  year={2018}
}
This paper describes an approach from our submissions for DCASE 2018 Task 2: general-purpose audio tagging of Freesound content with AudioSet labels. To tackle the problem of diverse recording environments, we propose to use background noise normalization. To tackle the problem of noisy labels, we propose to use pseudolabel for automatic label verification and label smoothing to reduce the over-fitting. We train several convolutional neural networks with data augmentation and different input… 

Figures and Tables from this paper

Learning Sound Event Classifiers from Web Audio with Noisy Labels

Experiments suggest that training with large amounts of noisy data can outperform training with smaller amounts of carefully-labeled data, and it is shown that noise-robust loss functions can be effective in improving performance in presence of corrupted labels.

Audio Tagging by Cross Filtering Noisy Labels

This article presents a novel framework, named CrossFilter, to combat the noisy labels problem for audio tagging, and achieves state-of-the-art performance and even surpasses the ensemble models on FSDKaggle2018 dataset.

Semi-supervised audio tagging with deep co-training and augmentations

  • Computer Science
  • 2020
This work proposes to artificially increase the 10% of labeled files by simply duplicating them in the mini-batches during learning, and transforming them with audio data augmentations, and reports experiments on the publicly available UrbanSound8K dataset.

DCASE 2019 TASK 3: A TWO-STEP SYSTEM FOR SOUND EVENT LOCALIZATION AND DETECTION Technical Report

A two-step system to do sound event localization and detection that combines the results of the event detector and direction-of-arrival estimator together and shows a significant improvement over the baseline solution in DCASE 2019 task 3 challenge.

A two-step system for sound event localization and detection

A two-step system to do sound event localization and detection that combines the results of the event detector and direction-of-arrival estimator together and shows a significant improvement over the baseline solution in DCASE 2019 task 3 challenge.

References

SHOWING 1-10 OF 22 REFERENCES

General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline

The goal of the task is to build an audio tagging system that can recognize the category of an audio clip from a subset of 41 diverse categories drawn from the AudioSet Ontology.

CNN architectures for large-scale audio classification

This work uses various CNN architectures to classify the soundtracks of a dataset of 70M training videos with 30,871 video-level labels, and investigates varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on the authors' audio classification task, and larger training and label sets help up to a point.

Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification

It is shown that the improved performance stems from the combination of a deep, high-capacity model and an augmented training set: this combination outperforms both the proposed CNN without augmentation and a “shallow” dictionary learning model with augmentation.

TUT database for acoustic scene classification and sound event detection

The recording and annotation procedure, the database content, a recommended cross-validation setup and performance of supervised acoustic scene classification system and event detection baseline system using mel frequency cepstral coefficients and Gaussian mixture models are presented.

Improved Regularization of Convolutional Neural Networks with Cutout

This paper shows that the simple regularization technique of randomly masking out square regions of input during training, which is called cutout, can be used to improve the robustness and overall performance of convolutional neural networks.

Audio Set: An ontology and human-labeled dataset for audio events

The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.

Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks

This simple and efficient method of semi-supervised learning for deep neural networks is proposed, trained in a supervised fashion with labeled and unlabeled data simultaneously and favors a low-density separation between classes.

Regularizing Neural Networks by Penalizing Confident Output Distributions

It is found that both label smoothing and the confidence penalty improve state-of-the-art models across benchmarks without modifying existing hyperparameters, suggesting the wide applicability of these regularizers.

Environmental sound classification with convolutional neural networks

  • Karol J. Piczak
  • Computer Science
    2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP)
  • 2015
The model outperforms baseline implementations relying on mel-frequency cepstral coefficients and achieves results comparable to other state-of-the-art approaches.

An investigation of deep neural networks for noise robust speech recognition

The noise robustness of DNN-based acoustic models can match state-of-the-art performance on the Aurora 4 task without any explicit noise compensation and can be further improved by incorporating information about the environment into DNN training using a new method called noise-aware training.