Guided Learning Convolution System for DCASE 2019 Task 4

@inproceedings{Lin2019GuidedLC,
  title={Guided Learning Convolution System for DCASE 2019 Task 4},
  author={Liwei Lin and Xiangdong Wang and Hong Liu and Yueliang Qian},
  booktitle={DCASE},
  year={2019}
}
In this paper, we describe in detail the system we submitted to DCASE2019 task 4: sound event detection (SED) in domestic environments. We employ a convolutional neural network (CNN) with an embedding-level attention pooling module to solve it. By considering the interference caused by the co-occurrence of multiple events in the unbalanced dataset, we utilize the disentangled feature to raise the performance of the model. To take advantage of the unlabeled data, we adopt Guided Learning for… 

Figures and Tables from this paper

Detecting Sound Events Using Convolutional Macaron Net With Pseudo Strong Labels
  • T. K. Chan, C. Chin
  • Computer Science
    2021 IEEE 23rd International Workshop on Multimedia Signal Processing (MMSP)
  • 2021
TLDR
This paper proposes addressing the lack of strongly labeled data by using pseudo strongly labeling data approximated using Convolutive Nonnegative Matrix Factorization and trains a novel architecture called the Convolutional Macaron Net (CMN), which combinesconvolutional Neural Network (CNN) with MN, in a semi-supervised manner.
Non-Negative Matrix Factorization-Convolutional Neural Network (NMF-CNN) For Sound Event Detection
TLDR
A deep learning model that integrates Non-Negative Matrix Factorization (NMF) with Convolutional Neural Network (CNN) to use NMF to provide an approximate strong label to the weakly labeled data is proposed.
Guided multi-branch learning systems for DCASE 2020 Task 4
TLDR
The experimental results prove that MBL can improve the model performance and using SS has great potential to improve the performance of SED ensemble system.
JOINT TRAINING OF GUIDED LEARNING AND MEAN TEACHER MODELS FOR SOUND EVENT DETECTION
TLDR
This paper's proposed model structure includes a feature-level front-end based on convolution neural networks (CNN), followed by both embedding-level and instance-level back-end attention modules, and a set of adaptive median windows for individual sound events is used to smooth the framelevel predictions in post-processing.
Detecting Acoustic Events Using Convolutional Macaron Net
TLDR
This paper proposes to address the issue of the lack of strongly labeled data by using pseudo strongly labeling data that is approximated using Convolutive Nonnegative Matrix Factorization (CNMF) and trains a new architecture combining Convolutional Neural Network with Macaron Net, which is term it as convolutional Macaron net (CMN).
Semi-Supervised NMF-CNN for Sound Event Detection
TLDR
A combinative approach using Nonnegative Matrix Factorization (NMF) and Convolutional Neural Network (CNN) is proposed for audio clip Sound Event Detection (SED) to approximate strong labels for the weakly labeled data.
MULTI-SCALE RESIDUAL CRNN WITH DATA AUGMENTATION FOR DCASE 2020 TASK 4 Technical Report
TLDR
This technical report improves the baseline by using a variety of data augmentation methods and synthesizing more complex synthetic data for training and presents multiscale residual convolutional recurrent neural network (CRNN) to solve the problem of multi-scale detection.
Guided Multi-Branch Learning Systems for Sound Event Detection with Sound Separation
TLDR
The experimental results prove that MBL can improve the model performance and using SS has great potential to improve the performance of SED ensemble system.
THE ACADEMIA SINICA SYSTEM OF SOUND EVENT DETECTION AND SEPARATION FOR DCASE 2020 Technical Report
TLDR
This report presents the system of sound event detection and separation in domestic environments for DCASE 2020, which is superior to the baseline while using the student model as the back-end classifier and guided learning mechanism and Mean Teacher to carry out weakly-supervised and semi-super supervised learning.
Research on Semi-Supervised Sound Event Detection Based on Mean Teacher Models Using ML-LoBCoD-NET
TLDR
The MRNN-Att network is proposed which combines the ML-LoBCoD-NET, a recurrent neural network (RNN), and an attention network for fusing the different features for weakly-supervised sound event detection task.
...
...

References

SHOWING 1-10 OF 16 REFERENCES
Specialized Decision Surface and Disentangled Feature for Weakly-Supervised Polyphonic Sound Event Detection
TLDR
Experiments show that the proposed SDS and DF significantly improve the detection performance of the embedding-level MIL approach with an attention pooling module and outperform the first place system in the challenge by $\mathbf {6.6}$ percentage points.
Sound Event Detection in Domestic Environments with Weakly Labeled Data and Soundscape Synthesis
TLDR
The paper introduces Domestic Environment Sound Event Detection (DESED) dataset mixing a part of last year dataset and an additional synthetic, strongly labeled, dataset provided this year that’s described more in detail.
Guided Learning for Weakly-Labeled Semi-Supervised Sound Event Detection
TLDR
An end-to-end semi-supervised learning process for these two models to enable their abilities to rise alternately and show that this approach achieves competitive performance on the DCASE2018 Task4 dataset.
What you need is a more professional teacher
TLDR
This work designs two extremely different models for different targets, one of which just pursues finer information for the final target, and one which is more professional to achieve higher coarse-level classification accuracy.
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
TLDR
Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Large-Scale Weakly Labeled Semi-Supervised Sound Event Detection in Domestic Environments
TLDR
This paper presents DCASE 2018 task 4.0, which evaluates systems for the large-scale detection of sound events using weakly labeled data (without time boundaries) and explores the possibility to exploit a large amount of unbalanced and unlabeled training data together with a small weakly labeling training set to improve system performance.
Guided Learning for the combination of weakly-supervised and semi-supervised learning
TLDR
This work presents an end-to-end semi-supervised learning process termed Guided Learning for these two different models to improve the training efficiency and presents a new approach which outperforms the first place result on DCASE2018 Task 4 which employs Mean Teacher with a well-design CRNN network.
Adam: A Method for Stochastic Optimization
TLDR
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Audio Set: An ontology and human-labeled dataset for audio events
TLDR
The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.
Scaper: A library for soundscape synthesis and augmentation
TLDR
Given a collection of iso-lated sound events, Scaper acts as a high-level sequencer that can generate multiple soundscapes from a single, probabilistically defined, “specification”, to increase the variability of the output.
...
...