• Corpus ID: 52016092

DCASE 2018 Challenge Surrey cross-task convolutional neural network baseline

@inproceedings{Kong2018DCASE2C,
  title={DCASE 2018 Challenge Surrey cross-task convolutional neural network baseline},
  author={Qiuqiang Kong and Turab Iqbal and Yong Xu and Wenwu Wang and Mark D. Plumbley},
  booktitle={DCASE},
  year={2018}
}
The Detection and Classification of Acoustic Scenes and Events (DCASE) consists of five audio classification and sound event detectiontasks: 1)Acousticsceneclassification,2)General-purposeaudio tagging of Freesound, 3) Bird audio detection, 4) Weakly-labeled semi-supervised sound event detection and 5) Multi-channel audio classification. In this paper, we create a cross-task baseline system for all five tasks based on a convlutional neural network (CNN): a “CNN Baseline” system. We implemented CNNs… 

Tables from this paper

DD-CNN: Depthwise Disout Convolutional Neural Network for Low-complexity Acoustic Scene Classification
TLDR
Experimental results demonstrate that the proposed Depthwise Disout Convolutional Neural Network can learn discriminative acoustic characteristics from audio fragments and effectively reduce the network complexity.
HODGEPODGE: Sound Event Detection Based on Ensemble of Semi-Supervised Learning Methods
In this paper, we present a method called HODGEPODGE\footnotemark[1] for large-scale detection of sound events using weakly labeled, synthetic, and unlabeled data proposed in the Detection and
Relation-guided acoustic scene classification aided with event embeddings
TLDR
A relation-guided ASC (RGASC) model to further exploit and coordinate the scene-event relation for the mutual benefit of scene and event recognition and improves the scene classi fication accuracy on the real-life dataset.
Sound Event Detection by Pseudo-Labeling in Weakly Labeled Dataset
TLDR
A more efficient model is constructed by employing a gated linear unit (GLU) and dilated convolution to improve the problems of de-emphasizing importance and lack of receptive field and a pseudo-label-based learning for classifying target contents and unknown contents is proposed by adding ’noise label’ and ‘noise loss’ so that unknown contents can be separated as much as possible through the noise label.
Prototypical Networks for Domain Adaptation in Acoustic Scene Classification
TLDR
This work explores a metric learning approach called prototypical networks using the TUT Urban Acoustic Scenes dataset, which consists of 10 different acoustic scenes recorded across 10 cities, and concludes that metric learning is a promising approach towards addressing the domain adaptation problem in ASC.
Hodge and Podge: Hybrid Supervised Sound Event Detection with Multi-Hot MixMatch and Composition Consistence Training
TLDR
This work explores how to extend deep SSL to result in a new, state-of-the-art sound event detection method called Hodge and Podge, and proposes multi-hot MixMatch and composition consistency training with temporal-frequency augmentation.
Audio Tagging by Cross Filtering Noisy Labels
TLDR
This article presents a novel framework, named CrossFilter, to combat the noisy labels problem for audio tagging, and achieves state-of-the-art performance and even surpasses the ensemble models on FSDKaggle2018 dataset.
A Squeeze-and-Excitation and Transformer based Cross-task System for Environmental Sound Recognition
TLDR
An architecture named SE-Trans is presented that uses attention mechanism-based Squeeze-andExcitation and Transformer encoder modules to learn channel-wise relationship and temporal dependencies of the acoustic features of ESR.
...
...

References

SHOWING 1-10 OF 35 REFERENCES
Deep Neural Network Baseline for DCASE Challenge 2016
TLDR
The DCASE Challenge 2016 contains tasks for Acoustic Scene Classification (ASC), Acoustic Event Detection (AED), and audio tagging, and DNN baselines indicate that DNNs can be successful in many of these tasks, but may not always perform better than the baselines.
CNN architectures for large-scale audio classification
TLDR
This work uses various CNN architectures to classify the soundtracks of a dataset of 70M training videos with 30,871 video-level labels, and investigates varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on the authors' audio classification task, and larger training and label sets help up to a point.
CP-JKU SUBMISSIONS FOR DCASE-2016 : A HYBRID APPROACH USING BINAURAL I-VECTORS AND DEEP CONVOLUTIONAL NEURAL NETWORKS
TLDR
This report describes the 4 submissions for Task 1 (Audio scene classification) of the DCASE-2016 challenge of the CP-JKU team and proposes a novel i-vector extraction scheme for ASC using both left and right audio channels and a Deep Convolutional Neural Network architecture trained on spectrograms of audio excerpts in end-to-end fashion.
Large-Scale Weakly Supervised Audio Classification Using Gated Convolutional Neural Network
In this paper, we present a gated convolutional neural network and a temporal attention-based localization method for audio classification, which won the 1st place in the large-scale weakly
DCASE2017 Challenge Setup: Tasks, Datasets and Baseline System
TLDR
This paper presents the setup of these tasks: task definition, dataset, experimental setup, and baseline system results on the development dataset.
ImageNet classification with deep convolutional neural networks
TLDR
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
Very Deep Convolutional Networks for Large-Scale Image Recognition
TLDR
This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection
TLDR
This work combines these two approaches in a convolutional recurrent neural network (CRNN) and applies it on a polyphonic sound event detection task and observes a considerable improvement for four different datasets consisting of everyday sound events.
A multi-device dataset for urban acoustic scene classification
TLDR
The acoustic scene classification task of DCASE 2018 Challenge and the TUT Urban Acoustic Scenes 2018 dataset provided for the task are introduced, and the performance of a baseline system in the task is evaluated.
A comparison of Deep Learning methods for environmental sound detection
TLDR
This work presents a comparison of several state-of-the-art Deep Learning models on the IEEE challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 challenge task and data, classifying sounds into one of fifteen common indoor and outdoor acoustic scenes.
...
...