Improving Sound Event Detection in Domestic Environments using Sound Separation
@inproceedings{Turpault2020ImprovingSE, title={Improving Sound Event Detection in Domestic Environments using Sound Separation}, author={Nicolas Turpault and Scott Wisdom and Hakan Erdogan and John R. Hershey and Romain Serizel and Eduardo Fonseca and Prem Seetharaman and Justin Salamon}, booktitle={DCASE}, year={2020} }
Performing sound event detection on real-world recordings often implies dealing with overlapping target sound events and non-target sounds, also referred to as interference or noise. Until now these problems were mainly tackled at the classifier level. We propose to use sound separation as a pre-processing for sound event detection. In this paper we start from a sound separation model trained on the Free Universal Sound Separation dataset and the DCASE 2020 task 4 sound event detection baseline…
17 Citations
Sound Event Detection and Separation: A Benchmark on Desed Synthetic Soundscapes
- Physics, Computer ScienceICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2021
It is shown that temporal localization of sound events remains a challenge for SED systems and that reverberation and non-target sound events severely degrade system performance.
ANALYSIS OF THE SOUND EVENT DETECTION METHODS AND SYSTEMS
- Computer ScienceAdvanced Information Systems
- 2022
A number of problems that are associated with the development of sound event detection systems, such as the deviation for each environment and each sound category, overlapping audio events, unreliable training data, etc are presented.
Selective Pseudo-labeling and Class-wise Discriminative Fusion for Sound Event Detection
- Computer ScienceArXiv
- 2022
A novel selective pseudo-labeling approach is proposed, termed SPL, to produce high confidence separated target events from blind sound separation outputs, which are then used to fine-tune the original SED model that pre-trained on the sound mixtures in a multi-objective learning style.
ADAPTIVE FOCAL LOSS WITH DATA AUGMENTATION FOR SEMI-SUPERVISED SOUND EVENT DETECTION Technical Report
- Computer Science
- 2021
This technical report describes the submission system for DCASE2021 Task4: sound event detection and separation in domestic environments, and proposes to use various methods such as the specaugment data augmentation method, adaptive focal loss, event specific post-processing to improve the performance.
A benchmark of state-of-the-art sound event detection systems evaluated on synthetic soundscapes
- Computer ScienceArXiv
- 2022
A benchmark of submissions to Detection and Classification Acoustic Scene and Events 2021 Challenge (DCASE) Task 4 representing a sampling of the state-of-the-art in Sound Event Detection task is proposed and results show that systems adapted to provide coarse segmentation outputs are more robust to different target to non-target signal-to-noise ratio and to time localization of the original event.
SoundBeam: Target sound extraction conditioned on sound-class labels and enrollment clues for increased performance and continuous learning
- Computer ScienceArXiv
- 2022
This paper introduces a TSE framework, SoundBeam, that combines the advantages of both enrollment and enrollment-based approaches, and performs an extensive evaluation of the different TSE schemes using synthesized and real mixtures, which shows the potential of Sound beam.
Improving Sound Event Detection Metrics: Insights from DCASE 2020
- Computer ScienceICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2021
This paper compares conventional event-based and segment-based criteria against the Polyphonic Sound Detection Score (PSDS)'s intersection-based criterion, over a selection of systems from DCASE 2020 Challenge Task 4.
Joint framework with deep feature distillation and adaptive focal loss for weakly supervised audio tagging and acoustic event detection
- Computer ScienceDigit. Signal Process.
- 2022
What’s all the Fuss about Free Universal Sound Separation Data?
- Computer Science, PhysicsICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2021
An open-source baseline separation model that can separate a variable number of sources in a mixture is introduced, based on an improved time-domain convolutional network (TDCN++), that achieves scale-invariant signal-to-noise ratio improvement (SI-SNRi) on mixtures with two to four sources.
Self-Supervised Learning from Automatically Separated Sound Scenes
- Computer Science2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
- 2021
This paper explores the use of unsupervised automatic sound separation to decompose unlabeled sound scenes into multiple semantically-linked views for use in self-supervised contrastive learning and finds that learning to associate input mixtures with their automatically separated outputs yields stronger representations than past approaches that use the mixtures alone.
References
SHOWING 1-10 OF 28 REFERENCES
Supervised model training for overlapping sound events based on unsupervised source separation
- Computer Science2013 IEEE International Conference on Acoustics, Speech and Signal Processing
- 2013
Two iterative approaches based on EM algorithm to select the most likely stream to contain the target sound to give a reasonable increase of 8 percentage units in the detection accuracy are proposed.
TUT database for acoustic scene classification and sound event detection
- Computer Science, Physics2016 24th European Signal Processing Conference (EUSIPCO)
- 2016
The recording and annotation procedure, the database content, a recommended cross-validation setup and performance of supervised acoustic scene classification system and event detection baseline system using mel frequency cepstral coefficients and Gaussian mixture models are presented.
Sound Event Detection in Domestic Environments with Weakly Labeled Data and Soundscape Synthesis
- Computer ScienceDCASE
- 2019
The paper introduces Domestic Environment Sound Event Detection (DESED) dataset mixing a part of last year dataset and an additional synthetic, strongly labeled, dataset provided this year that’s described more in detail.
Training Sound Event Detection on a Heterogeneous Dataset
- Computer Science, PhysicsDCASE
- 2020
This work proposes to perform a detailed analysis of DCASE 2020 task 4 sound event detection baseline with regards to several aspects such as the type of data used for training, the parameters of the mean-teacher or the transformations applied while generating the synthetic soundscapes.
Source Separation with Weakly Labelled Data: an Approach to Computational Auditory Scene Analysis
- Computer ScienceICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020
This work proposes a source separation framework trained with weakly labelled data that can separate 527 kinds of sound classes from AudioSet within a single system.
Sound Event Detection in Synthetic Domestic Environments
- Computer ScienceICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020
A comparative analysis of the performance of state-of-the-art sound event detection systems based on the results of task 4 of the DCASE 2019 challenge, where submitted systems were evaluated on a series of synthetic soundscapes that allow us to carefully control for different soundscape characteristics.
Metrics for Polyphonic Sound Event Detection
- Computer Science
- 2016
This paper presents and discusses various metrics proposed for evaluation of polyphonic sound event detection systems used in realistic situations where there are typically multiple sound sources…
Detection of overlapping acoustic events using a temporally-constrained probabilistic model
- Computer Science2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2016
Results show that the proposed system outperforms several state-of-the-art methods for overlapping acoustic event detection on the same task, using both frame-based and event-based metrics, and is robust to varying event density and noise levels.
Scaper: A library for soundscape synthesis and augmentation
- Computer Science2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
- 2017
Given a collection of iso-lated sound events, Scaper acts as a high-level sequencer that can generate multiple soundscapes from a single, probabilistically defined, “specification”, to increase the variability of the output.
Overlapping sound event detection with supervised Nonnegative Matrix Factorization
- Computer Science2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2017
The proposed supervised NMF-based system improves performance over the baseline and the submitted systems, and a general β-divergence version of the nonnegative task-driven dictionary learning model is proposed.