Corpus ID: 225041175

Urban Sound Classification : striving towards a fair comparison

@article{Arnault2020UrbanSC,
  title={Urban Sound Classification : striving towards a fair comparison},
  author={Augustin Arnault and Baptiste Hanssens and Nicolas Riche},
  journal={ArXiv},
  year={2020},
  volume={abs/2010.11805}
}
Urban sound classification has been achieving remarkable progress and is still an active research area in audio pattern recognition. In particular, it allows to monitor the noise pollution, which becomes a growing concern for large cities. The contribution of this paper is two-fold. First, we present our DCASE 2020 task 5 winning solution which aims at helping the monitoring of urban noise pollution. It achieves a macro-AUPRC of 0.82 / 0.62 for the coarse / fine classification on validation set… Expand

Figures and Tables from this paper

ESResNe(X)t-fbsp: Learning Robust Time-Frequency Transformation of Audio
TLDR
A new time-frequency transformation layer that is based on complex frequency B-spline (fbsp) wavelets being used with a high-performance audio classification model, which provides an accuracy improvement over the previously used Short-Time Fourier Transform (STFT) on standard datasets. Expand
Learning spectro-temporal representations of complex sounds with parameterized neural networks
TLDR
A parametrized neural network layer is proposed, which computes specific spectro-temporal modulations based on Gabor filters based on learnable spectro/temporal filters (STRFs) and is fully interpretable. Expand

References

SHOWING 1-10 OF 32 REFERENCES
ESResNet: Environmental Sound Classification Based on Visual Domain Models
TLDR
This work presents a model that is inherently compatible with mono and stereo sound inputs and out-performs all previously known approaches in a fair comparison, based on simple log-power Short-Time Fourier Transform (STFT) spectrograms. Expand
Urban Sound Tagging using Convolutional Neural Networks
  • Sainath Adapa
  • Computer Science, Engineering
  • Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019)
  • 2019
TLDR
It is shown that using pre-trained image classification models along with the usage of data augmentation techniques results in higher performance over alternative approaches. Expand
A Dataset and Taxonomy for Urban Sound Research
TLDR
A taxonomy of urban sounds and a new dataset, UrbanSound, containing 27 hours of audio with 18.5 hours of annotated sound event occurrences across 10 sound classes are presented. Expand
SONYC-UST-V2: An Urban Sound Tagging Dataset with Spatiotemporal Context
TLDR
The data collection procedure is described and evaluation metrics for multilabel classification of urban sound tags are proposed and the results of a simple baseline model that exploits spatiotemporal information are reported. Expand
ESC: Dataset for Environmental Sound Classification
TLDR
A new annotated collection of 2000 short clips comprising 50 classes of various common sound events, and an abundant unified compilation of 250000 unlabeled auditory excerpts extracted from recordings available through the Freesound project are presented. Expand
Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification
TLDR
It is shown that the improved performance stems from the combination of a deep, high-capacity model and an augmented training set: this combination outperforms both the proposed CNN without augmentation and a “shallow” dictionary learning model with augmentation. Expand
Detection and Classification of Acoustic Scenes and Events
TLDR
The state of the art in automatically classifying audio scenes, and automatically detecting and classifyingaudio events is reported on. Expand
Learning Environmental Sounds with Multi-scale Convolutional Neural Network
TLDR
A novel end-to-end network called WaveMsNet is proposed based on the multi-scale convolution operation and two-phase method, which can get better audio representation by improving the frequency resolution and learning filters cross all frequency area. Expand
A Comparison of Five Multiple Instance Learning Pooling Functions for Sound Event Detection with Weak Labeling
TLDR
This paper builds a neural network called TALNet, which is the first system to reach state-of-the-art audio tagging performance on Audio Set, while exhibiting strong localization performance on the DCASE 2017 challenge at the same time. Expand
Weight Standardization
TLDR
Weight Standardization is proposed to accelerate deep network training by standardizing the weights in the convolutional layers, which is able to smooth the loss landscape by reducing the Lipschitz constants of the loss and the gradients. Expand
...
1
2
3
4
...