• Corpus ID: 201652449

AUDIO TAGGING WITH MINIMAL SUPERVISION BASED ON MEAN TEACHER FOR DCASE 2019 CHALLENGE TASK 2 Technical Report

@inproceedings{He2019AUDIOTW,
  title={AUDIO TAGGING WITH MINIMAL SUPERVISION BASED ON MEAN TEACHER FOR DCASE 2019 CHALLENGE TASK 2 Technical Report},
  author={Jun He and Penghao Rao and Bo Sun and Lejun Yu},
  year={2019}
}
In this report, we describe the mean teacher based audio tagging system and performance applied to the task 2 of DCASE 2018 challenge, where the task evaluates systems for audio tagging with noisy labels and minimal supervision. The proposed system is based on a VGG16 network with attention mechanism and gated CNN. Following data augmentation techniques are used to increase model robustness: a) Scaling the signal with 0.75 to 1.5 time, b) Adding Gaussian white noise with 20dB to 40dB. Samples… 

Tables from this paper

References

SHOWING 1-7 OF 7 REFERENCES
Audio tagging with noisy labels and minimal supervision
TLDR
This paper presents the task setup, the FSDKaggle2019 dataset prepared for this scientific evaluation, and a baseline system consisting of a convolutional neural network.
MEAN TEACHER CONVOLUTION SYSTEM FOR DCASE 2018 TASK 4
TLDR
A mean-teacher model with context-gating convolutional neural network (CNN) and recurrent neuralnetwork (RNN) to maximize the use of unlabeled in-domain dataset is proposed.
DCASE 2018 Challenge baseline with convolutional neural networks
TLDR
Python implementation of DCASE 2018 has five tasks: 1) Acoustic scene classification, 2) General-purpose audio tagging, 3) Bird audio detection, 4) Weakly-labeled semi-supervised sound event detection and 5) Multi-channel audio tagging; the baseline source code contains the implementation of convolutional neural networks, including AlexNetish and VGGish -- networks originating from computer vision.
Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results
TLDR
The recently proposed Temporal Ensembling has achieved state-of-the-art results in several semi-supervised learning benchmarks, but it becomes unwieldy when learning large datasets, so Mean Teacher, a method that averages model weights instead of label predictions, is proposed.
Audio Set: An ontology and human-labeled dataset for audio events
TLDR
The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.
Language Modeling with Gated Convolutional Networks
TLDR
A finite context approach through stacked convolutions, which can be more efficient since they allow parallelization over sequential tokens, is developed and is the first time a non-recurrent approach is competitive with strong recurrent models on these large scale language tasks.
Freesound Datasets: A Platform for the Creation of Open Audio Datasets
Comunicacio presentada al 18th International Society for Music Information Retrieval Conference celebrada a Suzhou, Xina, del 23 al 27 d'cotubre de 2017.