• Corpus ID: 220403640

Improving Sound Event Detection in Domestic Environments using Sound Separation

@inproceedings{Turpault2020ImprovingSE,
  title={Improving Sound Event Detection in Domestic Environments using Sound Separation},
  author={Nicolas Turpault and Scott Wisdom and Hakan Erdogan and John R. Hershey and Romain Serizel and Eduardo Fonseca and Prem Seetharaman and Justin Salamon},
  booktitle={DCASE},
  year={2020}
}
Performing sound event detection on real-world recordings often implies dealing with overlapping target sound events and non-target sounds, also referred to as interference or noise. Until now these problems were mainly tackled at the classifier level. We propose to use sound separation as a pre-processing for sound event detection. In this paper we start from a sound separation model trained on the Free Universal Sound Separation dataset and the DCASE 2020 task 4 sound event detection baseline… 

Figures and Tables from this paper

Sound Event Detection and Separation: A Benchmark on Desed Synthetic Soundscapes
  • Nicolas Turpault, R. Serizel, J. Salamon
  • Physics, Computer Science
    ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2021
TLDR
It is shown that temporal localization of sound events remains a challenge for SED systems and that reverberation and non-target sound events severely degrade system performance.
ANALYSIS OF THE SOUND EVENT DETECTION METHODS AND SYSTEMS
TLDR
A number of problems that are associated with the development of sound event detection systems, such as the deviation for each environment and each sound category, overlapping audio events, unreliable training data, etc are presented.
Selective Pseudo-labeling and Class-wise Discriminative Fusion for Sound Event Detection
TLDR
A novel selective pseudo-labeling approach is proposed, termed SPL, to produce high confidence separated target events from blind sound separation outputs, which are then used to fine-tune the original SED model that pre-trained on the sound mixtures in a multi-objective learning style.
ADAPTIVE FOCAL LOSS WITH DATA AUGMENTATION FOR SEMI-SUPERVISED SOUND EVENT DETECTION Technical Report
TLDR
This technical report describes the submission system for DCASE2021 Task4: sound event detection and separation in domestic environments, and proposes to use various methods such as the specaugment data augmentation method, adaptive focal loss, event specific post-processing to improve the performance.
A benchmark of state-of-the-art sound event detection systems evaluated on synthetic soundscapes
TLDR
A benchmark of submissions to Detection and Classification Acoustic Scene and Events 2021 Challenge (DCASE) Task 4 representing a sampling of the state-of-the-art in Sound Event Detection task is proposed and results show that systems adapted to provide coarse segmentation outputs are more robust to different target to non-target signal-to-noise ratio and to time localization of the original event.
SoundBeam: Target sound extraction conditioned on sound-class labels and enrollment clues for increased performance and continuous learning
TLDR
This paper introduces a TSE framework, SoundBeam, that combines the advantages of both enrollment and enrollment-based approaches, and performs an extensive evaluation of the different TSE schemes using synthesized and real mixtures, which shows the potential of Sound beam.
Improving Sound Event Detection Metrics: Insights from DCASE 2020
TLDR
This paper compares conventional event-based and segment-based criteria against the Polyphonic Sound Detection Score (PSDS)'s intersection-based criterion, over a selection of systems from DCASE 2020 Challenge Task 4.
What’s all the Fuss about Free Universal Sound Separation Data?
  • Scott Wisdom, Hakan Erdogan, J. Hershey
  • Computer Science, Physics
    ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2021
TLDR
An open-source baseline separation model that can separate a variable number of sources in a mixture is introduced, based on an improved time-domain convolutional network (TDCN++), that achieves scale-invariant signal-to-noise ratio improvement (SI-SNRi) on mixtures with two to four sources.
Self-Supervised Learning from Automatically Separated Sound Scenes
TLDR
This paper explores the use of unsupervised automatic sound separation to decompose unlabeled sound scenes into multiple semantically-linked views for use in self-supervised contrastive learning and finds that learning to associate input mixtures with their automatically separated outputs yields stronger representations than past approaches that use the mixtures alone.
...
1
2
...

References

SHOWING 1-10 OF 28 REFERENCES
Supervised model training for overlapping sound events based on unsupervised source separation
TLDR
Two iterative approaches based on EM algorithm to select the most likely stream to contain the target sound to give a reasonable increase of 8 percentage units in the detection accuracy are proposed.
TUT database for acoustic scene classification and sound event detection
TLDR
The recording and annotation procedure, the database content, a recommended cross-validation setup and performance of supervised acoustic scene classification system and event detection baseline system using mel frequency cepstral coefficients and Gaussian mixture models are presented.
Sound Event Detection in Domestic Environments with Weakly Labeled Data and Soundscape Synthesis
TLDR
The paper introduces Domestic Environment Sound Event Detection (DESED) dataset mixing a part of last year dataset and an additional synthetic, strongly labeled, dataset provided this year that’s described more in detail.
Training Sound Event Detection on a Heterogeneous Dataset
TLDR
This work proposes to perform a detailed analysis of DCASE 2020 task 4 sound event detection baseline with regards to several aspects such as the type of data used for training, the parameters of the mean-teacher or the transformations applied while generating the synthetic soundscapes.
Source Separation with Weakly Labelled Data: an Approach to Computational Auditory Scene Analysis
TLDR
This work proposes a source separation framework trained with weakly labelled data that can separate 527 kinds of sound classes from AudioSet within a single system.
Sound Event Detection in Synthetic Domestic Environments
TLDR
A comparative analysis of the performance of state-of-the-art sound event detection systems based on the results of task 4 of the DCASE 2019 challenge, where submitted systems were evaluated on a series of synthetic soundscapes that allow us to carefully control for different soundscape characteristics.
Metrics for Polyphonic Sound Event Detection
This paper presents and discusses various metrics proposed for evaluation of polyphonic sound event detection systems used in realistic situations where there are typically multiple sound sources
Detection of overlapping acoustic events using a temporally-constrained probabilistic model
TLDR
Results show that the proposed system outperforms several state-of-the-art methods for overlapping acoustic event detection on the same task, using both frame-based and event-based metrics, and is robust to varying event density and noise levels.
Scaper: A library for soundscape synthesis and augmentation
TLDR
Given a collection of iso-lated sound events, Scaper acts as a high-level sequencer that can generate multiple soundscapes from a single, probabilistically defined, “specification”, to increase the variability of the output.
Overlapping sound event detection with supervised Nonnegative Matrix Factorization
TLDR
The proposed supervised NMF-based system improves performance over the baseline and the submitted systems, and a general β-divergence version of the nonnegative task-driven dictionary learning model is proposed.
...
1
2
3
...