Comparative Assessment of Data Augmentation for Semi-Supervised Polyphonic Sound Event Detection

  title={Comparative Assessment of Data Augmentation for Semi-Supervised Polyphonic Sound Event Detection},
  author={Lionel Delphin-Poulat and Rozenn Nicol and Cyril Plapous and Katell Peron},
  journal={2020 27th Conference of Open Innovations Association (FRUCT)},
In the context of audio ambient intelligence systems in Smart Buildings, polyphonic Sound Event Detection aims at detecting, localizing and classifying any sound event recorded in a room. Today, most of models are based on Deep Learning, requiring large databases to be trained. We propose a CRNN system exploiting unlabeled data with semi-supervised learning based on the “Mean teacher” method, in combination with data augmentation to overcome the limited size of the training dataset and to… 

Figures and Tables from this paper


Sound Event Localization and Detection using CRNN Architecture with Mixup for Model Generalization
The proposed architecture is based on Convolutional-Recurrent Neural Network (CRNN) architecture and introduced rectangular kernels in the pooling layers to minimize the information loss in temporal dimension within the CNN module, leading to boosting up the RNN module performance.
SpecAugment for Sound Event Detection in Domestic Environments using Ensemble of Convolutional Recurrent Neural Networks
By combining the proposed methods, sound event detection performance can be enhanced, compared with the baseline algorithm, and performance evaluation shows that the proposed method provides detection results of 40.89% for event-based metrics and 66.17% for segment-based metric.
Shuffling and Mixing Data Augmentation for Environmental Sound Classification
This paper proposes a data augmentation technique that generates new sound by shuffling and mixing two existing sounds of the same class in the dataset that creates new variations on both the temporal sequence and the density of the sound events.
Sound Event Detection in Domestic Environments with Weakly Labeled Data and Soundscape Synthesis
The paper introduces Domestic Environment Sound Event Detection (DESED) dataset mixing a part of last year dataset and an additional synthetic, strongly labeled, dataset provided this year that’s described more in detail.
Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Recognition
This work introduces a convolutional neural network (CNN) with a large input field for AED that significantly outperforms state of the art methods including Bag of Audio Words (BoAW) and classical CNNs, achieving a 16% absolute improvement.
Weakly labeled semi-supervised sound event detection using CRNN with inception module
By applying the proposed method to a weakly labeled semi-supervised sound event detection, it was verified that the proposed system provides better performance compared to the DCASE 2018 baseline system.
Large-Scale Weakly Labeled Semi-Supervised Sound Event Detection in Domestic Environments
This paper presents DCASE 2018 task 4.0, which evaluates systems for the large-scale detection of sound events using weakly labeled data (without time boundaries) and explores the possibility to exploit a large amount of unbalanced and unlabeled training data together with a small weakly labeling training set to improve system performance.
Convolutional Recurrent Neural Network and Data Augmentation for Audio Tagging with Noisy Labels and Minimal Supervision
This paper proposes a model consisting of a convolutional front end using log-mel-energies as input features, a recurrent neural network sequence encoder and a fully connected classifier network outputting an activity probability for each of the 80 considered event classes.
Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification
It is shown that the improved performance stems from the combination of a deep, high-capacity model and an augmented training set: this combination outperforms both the proposed CNN without augmentation and a “shallow” dictionary learning model with augmentation.
Scaper: A library for soundscape synthesis and augmentation
Given a collection of iso-lated sound events, Scaper acts as a high-level sequencer that can generate multiple soundscapes from a single, probabilistically defined, “specification”, to increase the variability of the output.