COMBINED SOUND EVENT DETECTION AND SOUND EVENT SEPARATION NETWORKS FOR DCASE 2020 TASK 4 Technical Report
@inproceedings{Chen2020COMBINEDSE, title={COMBINED SOUND EVENT DETECTION AND SOUND EVENT SEPARATION NETWORKS FOR DCASE 2020 TASK 4 Technical Report}, author={You-Siang Chen and Ziheng Lin and Shangwen Li and Chih-Yuan Koh and Mingsian Robin Bai and Jen-Tzung Chien and Yi-Wen Liu}, year={2020} }
In this paper, we propose a hybrid neural network (NN) to handle the tasks of sound event separation (SES) and sound event detection (SED) in Task 4 of DCASE 2020 challenge. The convolutional time-domain audio separation network (Conv-TasNet) is employed to extract the foreground sound events defined in DCASE challenge. By comparing the baseline SED network with various training strategies, we demonstrate that the SES network is capable of enhancing the SED performance effectively in terms of…
References
SHOWING 1-10 OF 14 REFERENCES
Sound Event Detection in Domestic Environments with Weakly Labeled Data and Soundscape Synthesis
- Computer ScienceDCASE
- 2019
The paper introduces Domestic Environment Sound Event Detection (DESED) dataset mixing a part of last year dataset and an additional synthetic, strongly labeled, dataset provided this year that’s described more in detail.
MEAN TEACHER WITH DATA AUGMENTATION FOR DCASE 2019 TASK 4 Technical Report
- Computer Science
- 2019
A mean-teacher model with convolutional neural network (CNN) and recurrent neuralnetwork (RNN) together with data augmentation and a median window tuned for each class based on prior knowledge is proposed.
Universal Sound Separation
- Computer Science2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
- 2019
A dataset of mixtures containing arbitrary sounds is developed, and the best methods produce an improvement in scale-invariant signal-to-distortion ratio of over 13 dB for speech/non-speech separation and close to 10 dB for universal sound separation.
Sound Event Detection in Synthetic Domestic Environments
- Computer ScienceICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020
A comparative analysis of the performance of state-of-the-art sound event detection systems based on the results of task 4 of the DCASE 2019 challenge, where submitted systems were evaluated on a series of synthetic soundscapes that allow us to carefully control for different soundscape characteristics.
Metrics for Polyphonic Sound Event Detection
- Computer Science
- 2016
This paper presents and discusses various metrics proposed for evaluation of polyphonic sound event detection systems used in realistic situations where there are typically multiple sound sources…
Scaper: A library for soundscape synthesis and augmentation
- Computer Science2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
- 2017
Given a collection of iso-lated sound events, Scaper acts as a high-level sequencer that can generate multiple soundscapes from a single, probabilistically defined, “specification”, to increase the variability of the output.
Single-Channel Multi-Speaker Separation Using Deep Clustering
- Computer ScienceINTERSPEECH
- 2016
This paper significantly improves upon the baseline system performance by incorporating better regularization, larger temporal context, and a deeper architecture, culminating in an overall improvement in signal to distortion ratio (SDR) of 10.3 dB compared to the baseline, and produces unprecedented performance on a challenging speech separation.
Speaker-Independent Speech Separation With Deep Attractor Network
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2018
This work proposes a novel deep learning framework for speech separation that uses a neural network to project the time-frequency representation of the mixture signal into a high-dimensional embedding space and proposes three methods for finding the attractors for each source in the embedded space and compares their advantages and limitations.
Multi-talker Speech Separation with Utterance-level Permutation Invariant Training of Deep Recurrent Neural Networks.
- Computer Science
- 2017
In this paper we propose the utterance-level Permutation Invariant Training (uPIT) technique. uPIT is a practically applicable, end-to-end, deep learning based solution for speaker independent…