• Corpus ID: 53156046

Weakly supervised CRNN system for sound event detection with large-scale unlabeled in-domain data

@article{Wang2018WeaklySC,
  title={Weakly supervised CRNN system for sound event detection with large-scale unlabeled in-domain data},
  author={Dezhi Wang and Lilun Zhang and Chang-chun Bao and Kele Xu and Boqing Zhu and Qiuqiang Kong},
  journal={ArXiv},
  year={2018},
  volume={abs/1811.00301}
}
Sound event detection (SED) is typically posed as a supervised learning problem requiring training data with strong temporal labels of sound events. However, the production of datasets with strong labels normally requires unaffordable labor cost. It limits the practical application of supervised SED methods. The recent advances in SED approaches focuses on detecting sound events by taking advantages of weakly labeled or unlabeled training data. In this paper, we propose a joint framework to… 

Figures and Tables from this paper

Multi Model-Based Distillation for Sound Event Detection
TLDR
This letter proposes a novel multi modelbased distillation approach for sound event detection by making use of the knowledge from models of multiple teachers which are complementary in detecting sound events.
Noise Robust Sound Event Detection Using Deep Learning and Audio Enhancement
TLDR
A unified approach to sound event detection is proposed that takes the advantage of both deep learning and audio enhancement, and a convolutional recurrent neural network is combined with a deep neural network to improve the performance of the SED classifiers and an optimally modified log-spectral amplitude estimator based audio enhancement method is employed.
Multi-Representation Knowledge Distillation For Audio Classification
TLDR
A novel end-to-end collaborative learning framework that takes multiple representations as the input to train the models in parallel and can improve the classification performance and achieve state-of-the-art results on both acoustic scene classification tasks and general audio tagging tasks.
A Mobile Application for Sound Event Detection
TLDR
The architecture of the solution includes offline training and online detection, which includes acquisition of sensor data, processing of audio signals, and detecting and recording of sound events.

References

SHOWING 1-10 OF 29 REFERENCES
Sound event detection using weakly-labeled semi-supervised data with GCRNNS, VAT and Self-Adaptive Label Refinement
TLDR
A gated convolutional recurrent neural network based approach to solve task 4, large-scale weakly labelled semi-supervised sound event detection in domestic environments, of the DCASE 2018 challenge and introduces self-adaptive label refinement, a method which allows unsupervised adaption of the trained system to refine the accuracy of frame-level class predictions.
Adaptive Pooling Operators for Weakly Labeled Sound Event Detection
TLDR
This paper treats SED as a multiple instance learning (MIL) problem, where training labels are static over a short excerpt, indicating the presence or absence of sound sources but not their temporal locality, and develops a family of adaptive pooling operators—referred to as autopool—which smoothly interpolate between common pooling Operators, and automatically adapt to the characteristics of the sound sources in question.
Sound event detection using weakly labeled dataset with stacked convolutional and recurrent neural network
TLDR
A stacked convolutional and recurrent neural network with two prediction layers in sequence one for the strong followed by the weak label, which achieves the best error rate of 0.84 for strong labels and F-score of 43.3% for weak labels on the unseen test split is proposed.
LARGE-SCALE WEAKLY LABELLED SEMI-SUPERVISED CQT BASED SOUND EVENT DETECTION IN DOMESTIC ENVIRONMENTS Technical Report
This paper proposes a constant quality transform based input feature for baseline architecture to learn the start and end time of sound events (strong labels) in an audio recording given just the
Sound event detection from weak annotations: weighted-GRU versus multi-instance-learning
TLDR
This paper addresses the detection of audio events in domestic environments in the case where a weakly annotated dataset is available for training, and explores two approaches: a ”weighted-GRU” (WGRU), in which a Convolutional Recurrent Neural Network is trained for classification and then exploited at the output of the time-distributed dense layer to perform localization.
FRAMECNN : A WEAKLY-SUPERVISED LEARNING FRAMEWORK FOR FRAME-WISE ACOUSTIC EVENT DETECTION AND CLASSIFICATION
In this paper, we describe our contribution to the challenge of detection and classification of acoustic scenes and events (DCASE2017). We propose framCNN, a novel weakly-supervised learning
Large-Scale Weakly Supervised Audio Classification Using Gated Convolutional Neural Network
In this paper, we present a gated convolutional neural network and a temporal attention-based localization method for audio classification, which won the 1st place in the large-scale weakly
A joint detection-classification model for audio tagging of weakly labelled data
TLDR
This work proposes a joint detection-classification (JDC) model to detect and classify the audio clip simultaneously and shows that the JDC model reduces the equal error rate (EER) from 19.0% to 16.9%.
A Joint Separation-Classification Model for Sound Event Detection of Weakly Labelled Data
TLDR
A joint separation-classification model trained only on weakly labelled audio data, that is, only the tags of an audio recording are known but the time of the events are unknown is proposed, outperforming deep neural network baseline of 0.29.
Orthogonality-Regularized Masked NMF for Learning on Weakly Labeled Audio Data
TLDR
It is demonstrated that the proposed Orthogonality-Regularized Masked NMF (ORM-NMF) can be used for Audio Event Detection of rare events and evaluated on the development data from Task2 of DCASE2017 Challenge.
...
...