• Corpus ID: 102352325

Cross-task learning for audio tagging, sound event detection and spatial localization: DCASE 2019 baseline systems

@article{Kong2019CrosstaskLF,
  title={Cross-task learning for audio tagging, sound event detection and spatial localization: DCASE 2019 baseline systems},
  author={Qiuqiang Kong and Yin Cao and Turab Iqbal and Yong Xu and Wenwu Wang and Mark D. Plumbley},
  journal={ArXiv},
  year={2019},
  volume={abs/1904.05635}
}
The Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 challenge focuses on audio tagging, sound event detection and spatial localisation. DCASE 2019 consists of five tasks: 1) acoustic scene classification, 2) audio tagging with noisy labels and minimal supervision, 3) sound event localisation and detection, 4) sound event detection in domestic environments, and 5) urban sound tagging. In this paper, we propose generic cross-task baseline systems based on convolutional… 

Figures and Tables from this paper

Sound Event Detection of Weakly Labelled Data With CNN-Transformer and Automatic Threshold Optimization
TLDR
A convolutional neural network transformer (CNN-Transfomer) is proposed for audio tagging and SED, and it is shown that CNN-Transformer performs similarly to a Convolutional recurrent neural network (CRNN).
Sound event detection and localization based on CNN and LSTM Technical Report
TLDR
This method improves the estimation accuracy of DOA and the recognition ability of SED by using DCASE2019 dataset and PyTorch deep learning tool.
Overview and Evaluation of Sound Event Localization and Detection in DCASE 2019
TLDR
An overview of the first international evaluation on sound event localization and detection, organized as a task of the DCASE 2019 Challenge, presents in detail how the systems were evaluated and ranked and the characteristics of the best-performing systems.
Domain Adaptation Neural Network for Acoustic Scene Classification in Mismatched Conditions
TLDR
The proposed DANN based acoustic scene classification method is evaluated on the subtask B of task 1 of the DCASE 2019 ASC challenge, which is a closed-set classification problem whose audio recordings were recorded by mismatch devices.
TWO-STAGE SOUND EVENT LOCALIZATION AND DETECTION USING INTENSITY VECTOR AND GENERALIZED CROSS-CORRELATION Technical Report
TLDR
A two-stage polyphonic sound event detection and localization method that is able to localize and detect overlapping sound events in different environments, and can improve the performance of both SED and DOA estimation, and performs significantly better than the baseline method.
WDXY SUBMISSION FOR DCASE-2019 : ACOUSTIC SCENE CLASSIFICATION WITH CONVOLUTION NEURAL NETWORKS
TLDR
This paper demonstrates how convolutional neural network is applied for DCASE 2019 task1, acoustic scene classification, and generates Mel spectrogram from binaural audio, adaptively learn 5 Convolutional Neural Networks.
Augmented Strategy For Polyphonic Sound Event Detection
  • Bolun Wang, Zhonghua Fu, Hao Wu
  • Computer Science
    2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
  • 2019
TLDR
An augmented strategy for polyphonic sound event classification that includes data augmentation to enrich training set to eliminate data unbalance, a new loss function that combines cross entropy and F-score, and model fusion to integrate the powers of different classifiers is proposed.
TIME-FREQUENCY SEGMENTATION ATTENTION NEURAL NETWORK FOR URBAN SOUND TAGGING Technical Report
TLDR
The proposed TFSANN model is validated on the development dataset of DCASE2019 task 5.0, and the coarsegrained and fine-grained taxonomy results are obtained on the Micro Area under precision-recall curve (AUPRC), Micro F1 score and Macro Area under accuracy curve ( AUPRC).
Acoustic Scene Classification With Squeeze-Excitation Residual Networks
TLDR
Two novel squeeze-excitation blocks are proposed to improve the accuracy of a CNN-based ASC framework based on residual learning and exceed the performance of the baseline proposed by the DCASE organization by 13% percentage points.
...
...

References

SHOWING 1-10 OF 31 REFERENCES
Large-Scale Weakly Supervised Audio Classification Using Gated Convolutional Neural Network
In this paper, we present a gated convolutional neural network and a temporal attention-based localization method for audio classification, which won the 1st place in the large-scale weakly
Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks
TLDR
The proposed convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3-D) space is generic and applicable to any array structures, robust to unseen DOA values, reverberation, and low SNR scenarios.
A multi-device dataset for urban acoustic scene classification
TLDR
The acoustic scene classification task of DCASE 2018 Challenge and the TUT Urban Acoustic Scenes 2018 dataset provided for the task are introduced, and the performance of a baseline system in the task is evaluated.
Learning Sound Event Classifiers from Web Audio with Noisy Labels
TLDR
Experiments suggest that training with large amounts of noisy data can outperform training with smaller amounts of carefully-labeled data, and it is shown that noise-robust loss functions can be effective in improving performance in presence of corrupted labels.
DCASE2017 Challenge Setup: Tasks, Datasets and Baseline System
TLDR
This paper presents the setup of these tasks: task definition, dataset, experimental setup, and baseline system results on the development dataset.
TUT database for acoustic scene classification and sound event detection
TLDR
The recording and annotation procedure, the database content, a recommended cross-validation setup and performance of supervised acoustic scene classification system and event detection baseline system using mel frequency cepstral coefficients and Gaussian mixture models are presented.
Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection
TLDR
This work combines these two approaches in a convolutional recurrent neural network (CRNN) and applies it on a polyphonic sound event detection task and observes a considerable improvement for four different datasets consisting of everyday sound events.
Deep Neural Network Baseline for DCASE Challenge 2016
TLDR
The DCASE Challenge 2016 contains tasks for Acoustic Scene Classification (ASC), Acoustic Event Detection (AED), and audio tagging, and DNN baselines indicate that DNNs can be successful in many of these tasks, but may not always perform better than the baselines.
Large-Scale Weakly Labeled Semi-Supervised Sound Event Detection in Domestic Environments
TLDR
This paper presents DCASE 2018 task 4.0, which evaluates systems for the large-scale detection of sound events using weakly labeled data (without time boundaries) and explores the possibility to exploit a large amount of unbalanced and unlabeled training data together with a small weakly labeling training set to improve system performance.
Audio Set: An ontology and human-labeled dataset for audio events
TLDR
The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.
...
...