• Corpus ID: 52405983

DCASE 2018 Challenge baseline with convolutional neural networks

@article{Kong2018DCASE2C,
  title={DCASE 2018 Challenge baseline with convolutional neural networks},
  author={Qiuqiang Kong and Turab Iqbal and Yong Xu and Wenwu Wang and Mark D. Plumbley},
  journal={ArXiv},
  year={2018},
  volume={abs/1808.00773}
}
The Detection and Classification of Acoustic Scenes and Events (DCASE) is a well-known IEEE AASP challenge consisting of a number of audio classification and sound event detection tasks. [] Key Method The baseline source code contains the implementation of convolutional neural networks (CNNs), including AlexNetish and VGGish -- networks originating from computer vision. We researched how the performance varies from task to task with the same configuration of neural networks. Experiments show that the deeper…

Tables from this paper

Cross-task learning for audio tagging, sound event detection and spatial localization: DCASE 2019 baseline systems
TLDR
This paper proposes generic cross-task baseline systems based on convolutional neural networks (CNNs) and finds that the 9-layer CNN with average pooling is a good model for a majority of the DCASE 2019 tasks.
Sound Event Detection of Weakly Labelled Data With CNN-Transformer and Automatic Threshold Optimization
TLDR
A convolutional neural network transformer (CNN-Transfomer) is proposed for audio tagging and SED, and it is shown that CNN-Transformer performs similarly to a Convolutional recurrent neural network (CRNN).
DCASE 2019: CNN depth analysis with different channel inputs for Acoustic Scene Classification
TLDR
The proposed framework based on Log-Mel spectrogram representations and VGG-based Convolutional Neural Networks outperforms the baseline system by 14.34 percentage points and is important for the implementation of real-time audio recognition and classification system on edge devices.
A Joint Framework for Audio Tagging and Weakly Supervised Acoustic Event Detection Using DenseNet with Global Average Pooling
TLDR
A network architecture mainly designed for audio tagging, which can also be used for weakly supervised acoustic event detection (AED), which consists of a modified DenseNet as the feature extractor, and a global average pooling (GAP) layer to predict frame-level labels at inference time.
Cure Dataset: Ladder Networks for Audio Event Classification
TLDR
The CURE dataset is established which contains curated set of specific audio events most relevant for people with hearing loss, which establishes the superiority of Ladder network over ELM and SVM classifier in terms of robustness and increased classification accuracy.
Weakly Labelled Audio Tagging Via Convolutional Networks with Spatial and Channel-Wise Attention
TLDR
A novel attention mechanism, namely, spatial and channel-wise attention (SCA), that can be employed into any CNNs seamlessly with affordable overheads and is end-to-end trainable fashion is proposed.
Cross-Modal Spectrum Transformation Network for Acoustic Scene Classification
TLDR
An acoustic spectrum transformation network where traditional log-mel spectrums are transformed into imagined visual features (IVF) is introduced, where the method outperforms other spectrum features, especially for unseen environments.
AUDIO TAGGING WITH MINIMAL SUPERVISION BASED ON MEAN TEACHER FOR DCASE 2019 CHALLENGE TASK 2 Technical Report
TLDR
The mean teacher based audio tagging system and performance applied to the task 2 of DCASE 2018 challenge, where the task evaluates systems for audio tagging with noisy labels and minimal supervision, is described.
DD-CNN: Depthwise Disout Convolutional Neural Network for Low-complexity Acoustic Scene Classification
TLDR
Experimental results demonstrate that the proposed Depthwise Disout Convolutional Neural Network can learn discriminative acoustic characteristics from audio fragments and effectively reduce the network complexity.
HODGEPODGE: Sound Event Detection Based on Ensemble of Semi-Supervised Learning Methods
In this paper, we present a method called HODGEPODGE\footnotemark[1] for large-scale detection of sound events using weakly labeled, synthetic, and unlabeled data proposed in the Detection and
...
...

References

SHOWING 1-10 OF 33 REFERENCES
Deep Neural Network Baseline for DCASE Challenge 2016
TLDR
The DCASE Challenge 2016 contains tasks for Acoustic Scene Classification (ASC), Acoustic Event Detection (AED), and audio tagging, and DNN baselines indicate that DNNs can be successful in many of these tasks, but may not always perform better than the baselines.
Large-Scale Weakly Supervised Audio Classification Using Gated Convolutional Neural Network
In this paper, we present a gated convolutional neural network and a temporal attention-based localization method for audio classification, which won the 1st place in the large-scale weakly
CNN architectures for large-scale audio classification
TLDR
This work uses various CNN architectures to classify the soundtracks of a dataset of 70M training videos with 30,871 video-level labels, and investigates varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on the authors' audio classification task, and larger training and label sets help up to a point.
CP-JKU SUBMISSIONS FOR DCASE-2016 : A HYBRID APPROACH USING BINAURAL I-VECTORS AND DEEP CONVOLUTIONAL NEURAL NETWORKS
TLDR
This report describes the 4 submissions for Task 1 (Audio scene classification) of the DCASE-2016 challenge of the CP-JKU team and proposes a novel i-vector extraction scheme for ASC using both left and right audio channels and a Deep Convolutional Neural Network architecture trained on spectrograms of audio excerpts in end-to-end fashion.
DCASE2017 Challenge Setup: Tasks, Datasets and Baseline System
TLDR
This paper presents the setup of these tasks: task definition, dataset, experimental setup, and baseline system results on the development dataset.
A comparison of Deep Learning methods for environmental sound detection
TLDR
This work presents a comparison of several state-of-the-art Deep Learning models on the IEEE challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 challenge task and data, classifying sounds into one of fifteen common indoor and outdoor acoustic scenes.
A multi-device dataset for urban acoustic scene classification
TLDR
The acoustic scene classification task of DCASE 2018 Challenge and the TUT Urban Acoustic Scenes 2018 dataset provided for the task are introduced, and the performance of a baseline system in the task is evaluated.
ImageNet classification with deep convolutional neural networks
TLDR
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
Audio Event Detection using Weakly Labeled Data
TLDR
It is shown that audio event detection using weak labels can be formulated as an Multiple Instance Learning problem and two frameworks for solving multiple-instance learning are suggested, one based on support vector machines, and the other on neural networks.
Very Deep Convolutional Networks for Large-Scale Image Recognition
TLDR
This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
...
...