• Corpus ID: 232085297

STACKED CONVOLUTIONAL NEURAL NETWORKS FOR GENERAL-PURPOSE AUDIO TAGGING Technical Report

@inproceedings{Iqbal2018STACKEDCN,
  title={STACKED CONVOLUTIONAL NEURAL NETWORKS FOR GENERAL-PURPOSE AUDIO TAGGING Technical Report},
  author={Turab Iqbal and Qiuqiang Kong and D. Plumbley and Mark D. Plumbley},
  year={2018}
}
This technical report describes the system we used to participate in Task 2 of the DCASE 2018 challenge. The particular considerations of this task include how to handle variable-length audio samples and the presence of noisy labels. We propose a number of neural network architectures that learn from log-mel spectrogram inputs. These baseline models involve the use of preprocessing techniques, data augmentation, loss function weighting, and pseudo-labeling in order to improve their performance… 

Figures and Tables from this paper

THUEE SYSTEM FOR DCASE 2019 CHALLENGE TASK 2 Technical Report

This submission for the DCASE 2019 Challenge: Audio tagging with noisy labels and minimal supervision is described, mainly based on two types of deep learning models: Convolutional Recurrent Neural Network (CRNN) and DenseNet.

Multiple Neural Networks with Ensemble Method for Audio Tagging with Noisy Labels and Minimal Supervision

This system uses a sigmoid-softmax activation to deal with so-called sparse multi-label classification and an ensemble method that averages models learned with multiple neural networks and various acoustic features to achieve labelweighted label-ranking average precision scores.

Receptive-field-regularized CNN variants for acoustic scene classification

This paper performs a systematic investigation of different RF configuration for various CNN architectures on the DCASE 2019 Task 1.A dataset, introduces Frequency Aware CNNs to compensate for the lack of frequency information caused by the restricted RF, and investigates if and in what RF ranges they yield additional improvement.

The Receptive Field as a Regularizer in Deep Convolutional Neural Networks for Acoustic Scene Classification

The receptive field (RF) of CNNs is analysed and the importance of the RF to the generalization capability of the models is demonstrated, showing that very small or very large RFs can cause performance degradation, but deep models can be made to generalize well by carefully choosing an appropriate RF size within a certain range.

Emotion and Theme Recognition in Music with Frequency-Aware RF-Regularized CNNs

It is observed that ResNets with smaller receptive fields -- originally adapted for acoustic scene classification -- also perform well in the emotion tagging task, and improves the performance of such architectures using techniques such as Frequency Awareness and Shake-Shake regularization.

Towards Large Scale Ecoacoustic Monitoring with Small Amounts of Labeled Data

This work confirms that data augmentation and global temporal pooling improve performance by more than 30%, demonstrates for the first time the utility of Shapley data valuation for audio classification, and finds that the wav2vec 2.0 model trained from scratch does not improve performance.

Evaluation of Hemodialysis Arteriovenous Bruit by Deep Learning

The analysis of arteriovenous fistula sound using deep learning has the potential to be used as an objective index in daily medical care.

References

SHOWING 1-10 OF 20 REFERENCES

ENSEMBLE OF CONVOLUTIONAL NEURAL NETWORKS FOR WEAKLY-SUPERVISED SOUND EVENT DETECTION USING MULTIPLE SCALE INPUT

The proposed model, an ensemble of convolutional neural networks to detect audio events in the automotive environment, achieved the 2nd place on audio tagging and the 1st place on sound event detection.

Large-Scale Weakly Supervised Audio Classification Using Gated Convolutional Neural Network

In this paper, we present a gated convolutional neural network and a temporal attention-based localization method for audio classification, which won the 1st place in the large-scale weakly

Rare Sound Event Detection Using 1D Convolutional Recurrent Neural Networks

The proposed system using combination of 1D convolutional neural network and recurrent neural network (RNN) with long shortterm memory units (LSTM) has achieved the 1st place in the challenge with an error rate of 0.13 and an F-Score of 93.1.

Very Deep Convolutional Networks for Large-Scale Image Recognition

This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline

The goal of the task is to build an audio tagging system that can recognize the category of an audio clip from a subset of 41 diverse categories drawn from the AudioSet Ontology.

Language Modeling with Gated Convolutional Networks

A finite context approach through stacked convolutions, which can be more efficient since they allow parallelization over sequential tokens, is developed and is the first time a non-recurrent approach is competitive with strong recurrent models on these large scale language tasks.

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.

Detection and classification of acoustic scenes and events: An IEEE AASP challenge

An overview of systems submitted to the public evaluation challenge on acoustic scene classification and detection of sound events within a scene as well as a detailed evaluation of the results achieved by those systems are provided.

mixup: Beyond Empirical Risk Minimization

This work proposes mixup, a simple learning principle that trains a neural network on convex combinations of pairs of examples and their labels, which improves the generalization of state-of-the-art neural network architectures.

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

These advanced recurrent units that implement a gating mechanism, such as a long short-term memory (LSTM) unit and a recently proposed gated recurrent unit (GRU), are found to be comparable to LSTM.