Self-Trained Audio Tagging and Sound Event Detection in Domestic Environments
@inproceedings{Ebbers2021SelfTrainedAT, title={Self-Trained Audio Tagging and Sound Event Detection in Domestic Environments}, author={Janek Ebbers and Reinhold H{\"a}b-Umbach}, booktitle={DCASE}, year={2021} }
In this paper we present our system for the Detection and Classification of Acoustic Scenes and Events (DCASE) 2021 Challenge Task 4: Sound Event Detection and Separation in Domestic Environments, where it scored the fourth rank. Our presented solution is an advancement of our system used in the previous edition of the task.We use a forward-backward convolutional recurrent neural network (FBCRNN) for tagging and pseudo labeling followed by tag-conditioned sound event detection (SED) models…
4 Citations
A benchmark of state-of-the-art sound event detection systems evaluated on synthetic soundscapes
- Computer ScienceICASSP
- 2022
A benchmark of submissions to Detection and Classification Acoustic Scene and Events 2021 Challenge (DCASE) Task 4 representing a sampling of the state-of-the-art in Sound Event Detection task is proposed and results show that systems adapted to provide coarse segmentation outputs are more robust to different target to non-target signal-to-noise ratio and to time localization of the original event.
ANALYSIS OF THE SOUND EVENT DETECTION METHODS AND SYSTEMS
- Computer ScienceAdvanced Information Systems
- 2022
A number of problems that are associated with the development of sound event detection systems, such as the deviation for each environment and each sound category, overlapping audio events, unreliable training data, etc are presented.
FilterAugment: An Acoustic Environmental Data Augmentation Method
- Physics
- 2021
Acoustic environments affect acoustic characteristics of sound to be recognized by physically interacting with sound wave propagation. Thus, training acoustic models for audio and speech tasks…
Threshold Independent Evaluation of Sound Event Detection Scores
- Computer ScienceICASSP
- 2022
A method which allows for computing system performance on an evaluation set for all possible thresholds jointly, enabling accurate computation not only of the PSD-ROC and PSDS but also of other collar-based and intersection-based performance curves.
References
SHOWING 1-10 OF 27 REFERENCES
Forward-Backward Convolutional Recurrent Neural Networks and Tag-Conditioned Convolutional Neural Networks for Weakly Labeled Semi-supervised Sound Event Detection
- Computer ScienceArXiv
- 2021
The presented system for the detection and classi-fication of acoustic scenes and events (DCASE) 2020 Challenge and a tag-conditioned CNN tocomplement SED is proposed, trained to predict strong labels while using weak labels, as additional input.
Large-Scale Weakly Supervised Audio Classification Using Gated Convolutional Neural Network
- Computer Science2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2018
In this paper, we present a gated convolutional neural network and a temporal attention-based localization method for audio classification, which won the 1st place in the large-scale weakly…
Convolutional Recurrent Neural Network and Data Augmentation for Audio Tagging with Noisy Labels and Minimal Supervision
- Computer ScienceDCASE
- 2019
This paper proposes a model consisting of a convolutional front end using log-mel-energies as input features, a recurrent neural network sequence encoder and a fully connected classifier network outputting an activity probability for each of the 80 considered event classes.
Sound Event Detection in Domestic Environments with Weakly Labeled Data and Soundscape Synthesis
- Computer ScienceDCASE
- 2019
The paper introduces Domestic Environment Sound Event Detection (DESED) dataset mixing a part of last year dataset and an additional synthetic, strongly labeled, dataset provided this year that’s described more in detail.
Weakly-Supervised Sound Event Detection with Self-Attention
- Computer ScienceICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020
A novel sound event detection method that incorporates a self-attention mechanism of the Transformer for a weakly-supervised learning scenario and introduces a special tag token into the input sequence for weak label prediction, which enables the aggregation of the whole sequence information.
Adaptive Pooling Operators for Weakly Labeled Sound Event Detection
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2018
This paper treats SED as a multiple instance learning (MIL) problem, where training labels are static over a short excerpt, indicating the presence or absence of sound sources but not their temporal locality, and develops a family of adaptive pooling operators—referred to as autopool—which smoothly interpolate between common pooling Operators, and automatically adapt to the characteristics of the sound sources in question.
A Closer Look at Weak Label Learning for Audio Events
- Computer ScienceArXiv
- 2018
This work describes a CNN based approach for weakly supervised training of audio events and describes important characteristics, which naturally arise inweakly supervised learning of sound events, and shows how these aspects of weak labels affect the generalization of models.
A Comparison of Five Multiple Instance Learning Pooling Functions for Sound Event Detection with Weak Labeling
- Computer ScienceICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2019
This paper builds a neural network called TALNet, which is the first system to reach state-of-the-art audio tagging performance on Audio Set, while exhibiting strong localization performance on the DCASE 2017 challenge at the same time.
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
- Computer ScienceINTERSPEECH
- 2019
This work presents SpecAugment, a simple data augmentation method for speech recognition that is applied directly to the feature inputs of a neural network (i.e., filter bank coefficients) and achieves state-of-the-art performance on the LibriSpeech 960h and Swichboard 300h tasks, outperforming all prior work.
Unsupervised Learning of Semantic Audio Representations
- Computer Science2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2018
This work considers several class-agnostic semantic constraints that apply to unlabeled nonspeech audio and proposes low-dimensional embeddings of the input spectrograms that recover 41% and 84% of the performance of their fully-supervised counterparts when applied to downstream query-by-example sound retrieval and sound event classification tasks, respectively.