A Dataset and Taxonomy for Urban Sound Research

  title={A Dataset and Taxonomy for Urban Sound Research},
  author={Justin Salamon and Christopher Jacoby and Juan Pablo Bello},
  journal={Proceedings of the 22nd ACM international conference on Multimedia},
  • J. Salamon, C. Jacoby, J. Bello
  • Published 3 November 2014
  • Computer Science
  • Proceedings of the 22nd ACM international conference on Multimedia
Automatic urban sound classification is a growing area of research with applications in multimedia retrieval and urban informatics. In this paper we identify two main barriers to research in this area - the lack of a common taxonomy and the scarceness of large, real-world, annotated data. To address these issues we present a taxonomy of urban sounds and a new dataset, UrbanSound, containing 27 hours of audio with 18.5 hours of annotated sound event occurrences across 10 sound classes. The… 

Figures from this paper

ESC: Dataset for Environmental Sound Classification
A new annotated collection of 2000 short clips comprising 50 classes of various common sound events, and an abundant unified compilation of 250000 unlabeled auditory excerpts extracted from recordings available through the Freesound project are presented.
Unsupervised feature learning for urban sound classification
  • J. Salamon, J. Bello
  • Computer Science
    2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2015
It is shown that feature learning can outperform the baseline approach by configuring it to capture the temporal dynamics of urban sources, and is evaluated on the largest public dataset of urban sound sources available for research, and compared to a baseline system based on MFCCs.
A Strongly-Labelled Polyphonic Dataset of Urban Sounds with Spatiotemporal Context
An accompanying hierarchical label taxonomy is introduced for SINGA: PURA, a strongly labelled polyphonic urban sound dataset with spatiotemporal context designed to be compatible with other existing datasets for urban sound tagging while also able to capture sound events unique to the Singaporean context.
A Low-Cost Sound Event Detection and Identification System for Urban Environments
A proof of concept for a smart, low-cost, acoustic sensor to be deployed in urban environments and the device’s design is described in detail in terms of its processing blocks, the experiments performed and their key results, as well as directions of future work.
MAVD: A Dataset for Sound Event Detection in Urban Environments
We describe the public release of a dataset for sound event detection in urban environments, namely MAVD, which is the first of a series of datasets planned within an ongoing research project for
Sound event detection in urban soundscape using two-level classification
A two level classification is proposed to classify urban sound events such as bus engine (BE), bus horn (BH), car horn (CH) and whistle (W) sounds to outperforms the existing approaches that usually does direct feature extraction without signal level analysis.
Classification and mapping of sound sources in local urban streets through AudioSet data and Bayesian optimized Neural Networks
This study focuses on the creation of Artificial Neural Networks (ANN) and Recurrent Neural networks (RNN) based models to classify sound sources from manually collected sound clips in local streets to obtain hyperparameter values of Neural Network models.
Sound analysis in smart cities
This chapter introduces the concept of smart cities and discusses the importance of sound as a source of information about urban life. It describes a wide range of applications for the computational
Urban Sound Classification : striving towards a fair comparison
This paper presents the DCASE 2020 task 5 winning solution which aims at helping the monitoring of urban noise pollution, and provides a fair comparison by using the same input representation, metrics and optimizer to assess performances.


A database and challenge for acoustic scene classification and event detection
This paper introduces a newly-launched public evaluation challenge dealing with two closely related tasks of the field: acoustic scene classification and event detection.
The authors outline the challenges posed by a cognitive sensor system and how it can approach a more humanlike understanding of soundscape, such as through the detection of meaningful events, deciding what data to record, and in-field sensor placement.
Audio analysis for surveillance applications
The proposed hybrid solution is capable of detecting new kinds of suspicious audio events that occur as outliers against a background of usual activity and adaptively learns a Gaussian mixture model to model the background sounds and updates the model incrementally as new audio data arrives.
Research into the practical and policy applications of soundscape concepts and techniques in urban areas (NANR 200)
Executive Summary 1 The aim of this review was to investigate existing research into soundscape concepts and to produce recommendations for future research into the practical identification,
Environmental Sound Recognition With Time–Frequency Audio Features
An empirical feature analysis for audio environment characterization is performed and a matching pursuit algorithm is proposed to use to obtain effective time-frequency features to yield higher recognition accuracy for environmental sounds.
Spectral vs. spectro-temporal features for acoustic event detection
  • Courtenay V. Cotton, D. Ellis
  • Computer Science, Physics
    2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
  • 2011
This work proposes an approach to detecting and modeling acoustic events that directly describes temporal context, using convolutive non-negative matrix factorization (NMF), and discovers a set of spectro-temporal patch bases that best describe the data.
Audio context recognition using audio event histograms
This paper presents a method for audio context recognition, meaning classification between everyday environments. The method is based on representing each audio context using a histogram of audio
Classifying soundtracks with audio texture features
It is shown that the texture statistics perform as well as the best conventional statistics (based on MFCC covariance) and the relative contributions of the different statistics are examined, showing the importance of modulation spectra and cross-band envelope correlations.
The Soundscape: Our Sonic Environment and the Tuning of the World
Schafer advocates soundscape study, or interdisciplinary research on the sonic environment that combines science, society, and the arts. Extensively quoting literature, he gives an historical