# Guided Learning Convolution System for DCASE 2019 Task 4

@inproceedings{Lin2019GuidedLC,
title={Guided Learning Convolution System for DCASE 2019 Task 4},
author={Liwei Lin and Xiangdong Wang and Hong Liu and Yueliang Qian},
booktitle={DCASE},
year={2019}
}
• Published in DCASE 11 September 2019
• Computer Science
In this paper, we describe in detail the system we submitted to DCASE2019 task 4: sound event detection (SED) in domestic environments. We employ a convolutional neural network (CNN) with an embedding-level attention pooling module to solve it. By considering the interference caused by the co-occurrence of multiple events in the unbalanced dataset, we utilize the disentangled feature to raise the performance of the model. To take advantage of the unlabeled data, we adopt Guided Learning for…

## Figures and Tables from this paper

Detecting Sound Events Using Convolutional Macaron Net With Pseudo Strong Labels
• Computer Science
2021 IEEE 23rd International Workshop on Multimedia Signal Processing (MMSP)
• 2021
This paper proposes addressing the lack of strongly labeled data by using pseudo strongly labeling data approximated using Convolutive Nonnegative Matrix Factorization and trains a novel architecture called the Convolutional Macaron Net (CMN), which combinesconvolutional Neural Network (CNN) with MN, in a semi-supervised manner.
Non-Negative Matrix Factorization-Convolutional Neural Network (NMF-CNN) For Sound Event Detection
• Computer Science
DCASE
• 2019
A deep learning model that integrates Non-Negative Matrix Factorization (NMF) with Convolutional Neural Network (CNN) to use NMF to provide an approximate strong label to the weakly labeled data is proposed.
Guided multi-branch learning systems for DCASE 2020 Task 4
• Computer Science
ArXiv
• 2020
The experimental results prove that MBL can improve the model performance and using SS has great potential to improve the performance of SED ensemble system.
JOINT TRAINING OF GUIDED LEARNING AND MEAN TEACHER MODELS FOR SOUND EVENT DETECTION
• Computer Science
• 2020
This paper's proposed model structure includes a feature-level front-end based on convolution neural networks (CNN), followed by both embedding-level and instance-level back-end attention modules, and a set of adaptive median windows for individual sound events is used to smooth the framelevel predictions in post-processing.
Detecting Acoustic Events Using Convolutional Macaron Net
• Computer Science
ArXiv
• 2020
This paper proposes to address the issue of the lack of strongly labeled data by using pseudo strongly labeling data that is approximated using Convolutive Nonnegative Matrix Factorization (CNMF) and trains a new architecture combining Convolutional Neural Network with Macaron Net, which is term it as convolutional Macaron net (CMN).
MULTI-SCALE RESIDUAL CRNN WITH DATA AUGMENTATION FOR DCASE 2020 TASK 4 Technical Report
• Computer Science
• 2020
This technical report improves the baseline by using a variety of data augmentation methods and synthesizing more complex synthetic data for training and presents multiscale residual convolutional recurrent neural network (CRNN) to solve the problem of multi-scale detection.
THE ACADEMIA SINICA SYSTEM OF SOUND EVENT DETECTION AND SEPARATION FOR DCASE 2020 Technical Report
• Computer Science
• 2020
This report presents the system of sound event detection and separation in domestic environments for DCASE 2020, which is superior to the baseline while using the student model as the back-end classifier and guided learning mechanism and Mean Teacher to carry out weakly-supervised and semi-super supervised learning.
Research on Semi-Supervised Sound Event Detection Based on Mean Teacher Models Using ML-LoBCoD-NET
• Computer Science
IEEE Access
• 2020
The MRNN-Att network is proposed which combines the ML-LoBCoD-NET, a recurrent neural network (RNN), and an attention network for fusing the different features for weakly-supervised sound event detection task.
COUPLE LEARNING: MEAN TEACHER WITH PLG MODEL IMPROVES THE RESULTS OF SOUND EVENT DETECTION
• Computer Science
• 2022
An effective Couple Learning method that combines a well-trained model and a Mean Teacher model that reduces the noise impact in the pseudo-labels introduced by detection errors and increases strongly and weakly-labeled data to improve the Mean Teacher method’s performance.
Couple Learning for semi-supervised sound event detection
• Computer Science
• 2021
An effective Couple Learning method 1 that combines a well-trained model and a Mean Teacher model that improves the Mean Teacher method’s performance and reduces the noise impact in the pseudo-labels introduced by detection errors is proposed.

## References

SHOWING 1-10 OF 16 REFERENCES
Specialized Decision Surface and Disentangled Feature for Weakly-Supervised Polyphonic Sound Event Detection
• Computer Science
IEEE/ACM Transactions on Audio, Speech, and Language Processing
• 2020
Experiments show that the proposed SDS and DF significantly improve the detection performance of the embedding-level MIL approach with an attention pooling module and outperform the first place system in the challenge by $\mathbf {6.6}$ percentage points.
Sound Event Detection in Domestic Environments with Weakly Labeled Data and Soundscape Synthesis
• Computer Science
DCASE
• 2019
The paper introduces Domestic Environment Sound Event Detection (DESED) dataset mixing a part of last year dataset and an additional synthetic, strongly labeled, dataset provided this year that’s described more in detail.
What you need is a more professional teacher
• Computer Science
ArXiv
• 2019
This work designs two extremely different models for different targets, one of which just pursues finer information for the final target, and one which is more professional to achieve higher coarse-level classification accuracy.
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
• Computer Science
ICML
• 2015
Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Large-Scale Weakly Labeled Semi-Supervised Sound Event Detection in Domestic Environments
• Computer Science
DCASE
• 2018
This paper presents DCASE 2018 task 4.0, which evaluates systems for the large-scale detection of sound events using weakly labeled data (without time boundaries) and explores the possibility to exploit a large amount of unbalanced and unlabeled training data together with a small weakly labeling training set to improve system performance.
Guided Learning for the combination of weakly-supervised and semi-supervised learning
• Computer Science
• 2019
This work presents an end-to-end semi-supervised learning process termed Guided Learning for these two different models to improve the training efficiency and presents a new approach which outperforms the first place result on DCASE2018 Task 4 which employs Mean Teacher with a well-design CRNN network.
Adam: A Method for Stochastic Optimization
• Computer Science
ICLR
• 2015
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Audio Set: An ontology and human-labeled dataset for audio events
• Computer Science
2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
• 2017
The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.
Scaper: A library for soundscape synthesis and augmentation
• Computer Science
2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
• 2017
Given a collection of iso-lated sound events, Scaper acts as a high-level sequencer that can generate multiple soundscapes from a single, probabilistically defined, “specification”, to increase the variability of the output.
Freesound Datasets: A Platform for the Creation of Open Audio Datasets
• Computer Science
ISMIR
• 2017
Comunicacio presentada al 18th International Society for Music Information Retrieval Conference celebrada a Suzhou, Xina, del 23 al 27 d'cotubre de 2017.