Hodge and Podge: Hybrid Supervised Sound Event Detection with Multi-Hot MixMatch and Composition Consistence Training
@article{Shi2021HodgeAP, title={Hodge and Podge: Hybrid Supervised Sound Event Detection with Multi-Hot MixMatch and Composition Consistence Training}, author={Ziqiang Shi and Liu Liu and Huibin Lin and Rujie Liu}, journal={2020 28th European Signal Processing Conference (EUSIPCO)}, year={2021}, pages={1-5} }
In this paper, we propose a method called Hodge and Podge for sound event detection. We demonstrate Hodge and Podge on the dataset of Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 Challenge Task 4. This task aims to predict the presence or absence and the onset and offset times of sound events in home environments. Sound event detection is challenging due to the lack of large scale real strongly labeled data. Recently deep semi-supervised learning (SSL) has proven to…
References
SHOWING 1-10 OF 22 REFERENCES
HODGEPODGE: Sound Event Detection Based on Ensemble of Semi-Supervised Learning Methods
- Computer ScienceDCASE
- 2019
In this paper, we present a method called HODGEPODGE\footnotemark[1] for large-scale detection of sound events using weakly labeled, synthetic, and unlabeled data proposed in the Detection and…
Sound Event Detection in Domestic Environments with Weakly Labeled Data and Soundscape Synthesis
- Computer ScienceDCASE
- 2019
The paper introduces Domestic Environment Sound Event Detection (DESED) dataset mixing a part of last year dataset and an additional synthetic, strongly labeled, dataset provided this year that’s described more in detail.
Large-Scale Weakly Labeled Semi-Supervised Sound Event Detection in Domestic Environments
- Computer ScienceDCASE
- 2018
This paper presents DCASE 2018 task 4.0, which evaluates systems for the large-scale detection of sound events using weakly labeled data (without time boundaries) and explores the possibility to exploit a large amount of unbalanced and unlabeled training data together with a small weakly labeling training set to improve system performance.
DCASE 2018 Challenge baseline with convolutional neural networks
- Computer ScienceArXiv
- 2018
Python implementation of DCASE 2018 has five tasks: 1) Acoustic scene classification, 2) General-purpose audio tagging, 3) Bird audio detection, 4) Weakly-labeled semi-supervised sound event detection and 5) Multi-channel audio tagging; the baseline source code contains the implementation of convolutional neural networks, including AlexNetish and VGGish -- networks originating from computer vision.
DCASE 2018 Challenge Surrey cross-task convolutional neural network baseline
- Computer ScienceDCASE
- 2018
A cross-task baseline system for all five tasks based on a convlutional neural network (CNN): a “CNN Baseline” system that implemented CNNs with 4 layers and 8 layers originating from AlexNet and VGG from computer vision.
MEAN TEACHER CONVOLUTION SYSTEM FOR DCASE 2018 TASK 4
- Computer Science
- 2018
A mean-teacher model with context-gating convolutional neural network (CNN) and recurrent neuralnetwork (RNN) to maximize the use of unlabeled in-domain dataset is proposed.
Squeeze-and-Excitation Networks
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2020
This work proposes a novel architectural unit, which is term the “Squeeze-and-Excitation” (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels and shows that these blocks can be stacked together to form SENet architectures that generalise extremely effectively across different datasets.
Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results
- Computer ScienceNIPS
- 2017
The recently proposed Temporal Ensembling has achieved state-of-the-art results in several semi-supervised learning benchmarks, but it becomes unwieldy when learning large datasets, so Mean Teacher, a method that averages model weights instead of label predictions, is proposed.
Audio Set: An ontology and human-labeled dataset for audio events
- Computer Science2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2017
The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.
Interpolation Consistency Training for Semi-Supervised Learning
- Computer ScienceIJCAI
- 2019
Interpolation Consistency Training (ICT), a simple and computation efficient algorithm for training Deep Neural Networks in the semi-supervised learning paradigm, achieves state-of-the-art performance when applied to standard neural network architectures on the CIFAR-10 and SVHN benchmark datasets.