SpecAugment for Sound Event Detection in Domestic Environments using Ensemble of Convolutional Recurrent Neural Networks
@inproceedings{Lim2019SpecAugmentFS, title={SpecAugment for Sound Event Detection in Domestic Environments using Ensemble of Convolutional Recurrent Neural Networks}, author={Wootaek Lim}, booktitle={DCASE}, year={2019} }
In this paper, we present a method to detect sound events in domestic environments using small weakly labeled data, large unlabeled data, and strongly labeled synthetic data as proposed in the Detection and Classification of Acoustic Scenes and Events 2019 Challenge task 4. To solve the problem, we use a convolutional recurrent neural network composed of stacks of convolutional neural networks and bi-directional gated recurrent units. Moreover, we propose various methods such as SpecAugment…
2 Citations
Comparative Assessment of Data Augmentation for Semi-Supervised Polyphonic Sound Event Detection
- Computer Science2020 27th Conference of Open Innovations Association (FRUCT)
- 2020
This work proposes a CRNN system exploiting unlabeled data with semi-supervised learning based on the “Mean teacher” method, in combination with data augmentation to overcome the limited size of the training dataset and to further improve the performances.
On Open-Set Classification with L3-Net Embeddings for Machine Listening Applications
- Computer Science2020 28th European Signal Processing Conference (EUSIPCO)
- 2021
A neural network that combines all L3-Net embeddings belonging to one recording into a single vector by using an x-vector mechanism as well as an open-set classification system based on that are presented.
References
SHOWING 1-10 OF 20 REFERENCES
Weakly labeled semi-supervised sound event detection using CRNN with inception module
- Computer ScienceDCASE
- 2018
By applying the proposed method to a weakly labeled semi-supervised sound event detection, it was verified that the proposed system provides better performance compared to the DCASE 2018 baseline system.
Sound Event Detection in Domestic Environments with Weakly Labeled Data and Soundscape Synthesis
- Computer ScienceDCASE
- 2019
The paper introduces Domestic Environment Sound Event Detection (DESED) dataset mixing a part of last year dataset and an additional synthetic, strongly labeled, dataset provided this year that’s described more in detail.
Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2018
The emergence of deep learning as the most popular classification method is observed, replacing the traditional approaches based on Gaussian mixture models and support vector machines.
Joint Acoustic and Class Inference for Weakly Supervised Sound Event Detection
- Computer ScienceICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2019
This work presents a hybrid approach that combines an acoustic-driven event boundary detection and a supervised label inference using a deep neural network that leverages benefits of both unsupervised and supervised methodologies and takes advantage of large amounts of unlabeled data, making it ideal for large-scale weakly la-beled event detection.
Sound Event Detection from Partially Annotated Data: Trends and Challenges
- Computer Science
- 2019
A detailed analysis of the impact of the time segmentation, the event classification and the methods used to exploit unlabeled data on the final performance of sound event detection systems is proposed.
Large-Scale Weakly Labeled Semi-Supervised Sound Event Detection in Domestic Environments
- Computer ScienceDCASE
- 2018
This paper presents DCASE 2018 task 4.0, which evaluates systems for the large-scale detection of sound events using weakly labeled data (without time boundaries) and explores the possibility to exploit a large amount of unbalanced and unlabeled training data together with a small weakly labeling training set to improve system performance.
DCASE 2018 Challenge baseline with convolutional neural networks
- Computer ScienceArXiv
- 2018
Python implementation of DCASE 2018 has five tasks: 1) Acoustic scene classification, 2) General-purpose audio tagging, 3) Bird audio detection, 4) Weakly-labeled semi-supervised sound event detection and 5) Multi-channel audio tagging; the baseline source code contains the implementation of convolutional neural networks, including AlexNetish and VGGish -- networks originating from computer vision.
MEAN TEACHER CONVOLUTION SYSTEM FOR DCASE 2018 TASK 4
- Computer Science
- 2018
A mean-teacher model with context-gating convolutional neural network (CNN) and recurrent neuralnetwork (RNN) to maximize the use of unlabeled in-domain dataset is proposed.
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
- Computer ScienceINTERSPEECH
- 2019
This work presents SpecAugment, a simple data augmentation method for speech recognition that is applied directly to the feature inputs of a neural network (i.e., filter bank coefficients) and achieves state-of-the-art performance on the LibriSpeech 960h and Swichboard 300h tasks, outperforming all prior work.
The SINS Database for Detection of Daily Activities in a Home Environment Using an Acoustic Sensor Network
- Computer ScienceDCASE
- 2017
A database recorded in one living home, over a period of one week, containing activities being performed in a spontaneous manner, which make use of an acoustic sensor network, and are recorded as a continuous stream is introduced.