• Corpus ID: 220403357

Training Sound Event Detection on a Heterogeneous Dataset

@article{Turpault2020TrainingSE,
  title={Training Sound Event Detection on a Heterogeneous Dataset},
  author={Nicolas Turpault and Romain Serizel},
  journal={ArXiv},
  year={2020},
  volume={abs/2007.03931}
}
Training a sound event detection algorithm on a heterogeneous dataset including both recorded and synthetic soundscapes that can have various labeling granularity is a non-trivial task that can lead to systems requiring several technical choices. These technical choices are often passed from one system to another without being questioned. We propose to perform a detailed analysis of DCASE 2020 task 4 sound event detection baseline with regards to several aspects such as the type of data used… 

Figures and Tables from this paper

Sound Event Detection and Separation: A Benchmark on Desed Synthetic Soundscapes
  • Nicolas Turpault, R. Serizel, J. Salamon
  • Physics, Computer Science
    ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2021
TLDR
It is shown that temporal localization of sound events remains a challenge for SED systems and that reverberation and non-target sound events severely degrade system performance.
A Multi-Resolution CRNN-Based Approach for Semi-Supervised Sound Event Detection in DCASE 2020 Challenge
TLDR
This work proposes a multi-resolution analysis for feature extraction in Sound Event detection, hypothesizing that different resolutions can be more adequate for the detection of different sound event categories, and that combining the information provided by multiple resolutions could improve the performance of Sound Event Detection systems.
Improving Sound Event Detection in Domestic Environments using Sound Separation
TLDR
This paper starts from a sound separation model trained on the Free Universal Sound Separation dataset and the DCASE 2020 task 4 sound event detection baseline, and explores different methods to combine separated sound sources and the original mixture within the sound event Detection.
Sound Event Detection with Cross-Referencing Self-Training
TLDR
This approach takes advantage of semi-supervised training using pseudo-labels from a balanced student-teacher model, and outperforms DCASE2021 challenge baseline in terms of Poly-phonic Sound event Detection Score.
Cross-Referencing Self-Training Network for Sound Event Detection in Audio Mixtures
TLDR
This study proposes a semi-supervised method for generating pseudo-labels from unsupervised data using a student-teacher scheme that balances self- training and cross-training and explores post-processing which extracts sound intervals from network prediction, for further improvement in sound event detection performance.
A benchmark of state-of-the-art sound event detection systems evaluated on synthetic soundscapes
TLDR
A benchmark of submissions to Detection and Classification Acoustic Scene and Events 2021 Challenge (DCASE) Task 4 representing a sampling of the state-of-the-art in Sound Event Detection task is proposed and results show that systems adapted to provide coarse segmentation outputs are more robust to different target to non-target signal-to-noise ratio and to time localization of the original event.
An analysis of Sound Event Detection under acoustic degradation using multi-resolution systems
TLDR
This paper analyzes the performance of Sound Event Detection systems under diverse acoustic conditions such as high-pass or low-pass filtering, clipping or dynamic range compression, and provides insights on the benefits of this multi-resolution approach in different acoustic settings.
Self-Training for Sound Event Detection in Audio Mixtures
TLDR
A self-training technique to leverage unlabeled datasets in supervised learning using pseudo label estimation and a dual-term objective function: a classification loss for the original labels and expectation loss for pseudo labels is proposed.
Peer Collaborative Learning for Polyphonic Sound Event Detection
This paper describes that semi-supervised learning called peer collaborative learning (PCL) can be applied to the polyphonic sound event detection (PSED) task, which is one of the tasks in the
SOUND EVENT DETECTION SYSTEM FOR DCASE 2021 CHALLENGE Technical Report
This paper presents the systems proposal for the DCASE 2021 challenge Task 4 (Sound event detection and separation in domestic environments). The aim is to provide the event time localization
...
1
2
3
...

References

SHOWING 1-10 OF 27 REFERENCES
Sound Event Detection in Domestic Environments with Weakly Labeled Data and Soundscape Synthesis
TLDR
The paper introduces Domestic Environment Sound Event Detection (DESED) dataset mixing a part of last year dataset and an additional synthetic, strongly labeled, dataset provided this year that’s described more in detail.
Large-Scale Weakly Labeled Semi-Supervised Sound Event Detection in Domestic Environments
TLDR
This paper presents DCASE 2018 task 4.0, which evaluates systems for the large-scale detection of sound events using weakly labeled data (without time boundaries) and explores the possibility to exploit a large amount of unbalanced and unlabeled training data together with a small weakly labeling training set to improve system performance.
Adaptive Pooling Operators for Weakly Labeled Sound Event Detection
TLDR
This paper treats SED as a multiple instance learning (MIL) problem, where training labels are static over a short excerpt, indicating the presence or absence of sound sources but not their temporal locality, and develops a family of adaptive pooling operators—referred to as autopool—which smoothly interpolate between common pooling Operators, and automatically adapt to the characteristics of the sound sources in question.
Sound Event Detection in Synthetic Domestic Environments
TLDR
A comparative analysis of the performance of state-of-the-art sound event detection systems based on the results of task 4 of the DCASE 2019 challenge, where submitted systems were evaluated on a series of synthetic soundscapes that allow us to carefully control for different soundscape characteristics.
TUT database for acoustic scene classification and sound event detection
TLDR
The recording and annotation procedure, the database content, a recommended cross-validation setup and performance of supervised acoustic scene classification system and event detection baseline system using mel frequency cepstral coefficients and Gaussian mixture models are presented.
Scaper: A library for soundscape synthesis and augmentation
TLDR
Given a collection of iso-lated sound events, Scaper acts as a high-level sequencer that can generate multiple soundscapes from a single, probabilistically defined, “specification”, to increase the variability of the output.
Metrics for Polyphonic Sound Event Detection
This paper presents and discusses various metrics proposed for evaluation of polyphonic sound event detection systems used in realistic situations where there are typically multiple sound sources
Detection and Classification of Acoustic Scenes and Events
TLDR
The state of the art in automatically classifying audio scenes, and automatically detecting and classifyingaudio events is reported on.
Audio Set: An ontology and human-labeled dataset for audio events
TLDR
The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.
Event-based Video Retrieval Using Audio
TLDR
Several systems for performing MED using only audio data are presented, the results of each system on the TRECVID MED 2011 development dataset are reported, and the strengths and weaknesses of each approach are compared.
...
1
2
3
...