Corpus ID: 235742861

Self-training with noisy student model and semi-supervised loss function for dcase 2021 challenge task 4

@article{Kim2021SelftrainingWN,
  title={Self-training with noisy student model and semi-supervised loss function for dcase 2021 challenge task 4},
  author={Nam Kyun Kim and Hong Kook Kim},
  journal={ArXiv},
  year={2021},
  volume={abs/2107.02569}
}
This report proposes a polyphonic sound event detection (SED) method for the DCASE 2021 Challenge Task 4. The proposed SED model consists of two stages: a mean-teacher model for providing target labels regarding weakly labeled or unlabeled data and a self-training-based noisy student model for predicting strong labels for sound events. The mean-teacher model, which is based on the residual convolutional recurrent neural network (RCRNN) for the teacher and student model, is first trained using… Expand
1 Citations

Figures and Tables from this paper

FilterAugment: An Acoustic Environmental Data Augmentation Method
Acoustic environments affect acoustic characteristics of sound to be recognized under physically interaction with sound wave propagation. Thus, training acoustic models for audio and speech tasksExpand

References

SHOWING 1-10 OF 16 REFERENCES
Polyphonic Sound Event Detection Based on Residual Convolutional Recurrent Neural Network With Semi-Supervised Loss Function
TLDR
A two-stage polyphonic SED model when strongly labeled data are limited but weakly labeled and unlabeled data are available is proposed, and its performance is compared with those of the baseline and top-ranked models from both challenges. Expand
Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results
TLDR
The recently proposed Temporal Ensembling has achieved state-of-the-art results in several semi-supervised learning benchmarks, but it becomes unwieldy when learning large datasets, so Mean Teacher, a method that averages model weights instead of label predictions, is proposed. Expand
MEAN TEACHER WITH DATA AUGMENTATION FOR DCASE 2019 TASK 4 Technical Report
In this paper, we present our neural network for the DCASE 2019 challenge’s Task 4 (Sound event detection in domestic environments) [1]. The goal of the task is to evaluate systems for the detectionExpand
Automated audio captioning with recurrent neural networks
TLDR
Results from metrics show that the proposed method can predict words appearing in the original caption, but not always correctly ordered. Expand
Sound event detection in domestic environments with weakly labeled data and soundscape synthesis
TLDR
The paper introduces Domestic Environment Sound Event Detection (DESED) dataset mixing a part of last year dataset and an additional synthetic, strongly labeled, dataset provided this year that’s described more in detail. Expand
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
TLDR
This work presents SpecAugment, a simple data augmentation method for speech recognition that is applied directly to the feature inputs of a neural network (i.e., filter bank coefficients) and achieves state-of-the-art performance on the LibriSpeech 960h and Swichboard 300h tasks, outperforming all prior work. Expand
Audio Set: An ontology and human-labeled dataset for audio events
TLDR
The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers. Expand
Scaper: A library for soundscape synthesis and augmentation
TLDR
Given a collection of iso-lated sound events, Scaper acts as a high-level sequencer that can generate multiple soundscapes from a single, probabilistically defined, “specification”, to increase the variability of the output. Expand
CBAM: Convolutional Block Attention Module
TLDR
The proposed Convolutional Block Attention Module (CBAM), a simple yet effective attention module for feed-forward convolutional neural networks, can be integrated into any CNN architectures seamlessly with negligible overheads and is end-to-end trainable along with base CNNs. Expand
mixup: Beyond Empirical Risk Minimization
TLDR
This work proposes mixup, a simple learning principle that trains a neural network on convex combinations of pairs of examples and their labels, which improves the generalization of state-of-the-art neural network architectures. Expand
...
1
2
...