SONYC Urban Sound Tagging (SONYC-UST): A Multilabel Dataset from an Urban Acoustic Sensor Network

@inproceedings{Cartwright2019SONYCUS,
  title={SONYC Urban Sound Tagging (SONYC-UST): A Multilabel Dataset from an Urban Acoustic Sensor Network},
  author={Mark Cartwright and Ana Elisa M{\'e}ndez M{\'e}ndez and Jason Cramer and Vincent Lostanlen and Graham Dove and Ho-Hsiang Wu and Justin Salamon and Oded Nov and Juan Pablo Bello},
  booktitle={DCASE},
  year={2019}
}
SONYC Urban Sound Tagging (SONYC-UST) is a dataset for the development and evaluation of machine listening systems for realworld urban noise monitoring. It consists of 3068 audio recordings from the “Sounds of New York City” (SONYC) acoustic sensor network. Via the Zooniverse citizen science platform, volunteers tagged the presence of 23 fine-grained classes that were chosen in consultation with the New York City Department of Environmental Protection. These 23 fine-grained classes can be… 

Figures and Tables from this paper

SONYC-UST-V2: An Urban Sound Tagging Dataset with Spatiotemporal Context
TLDR
The data collection procedure is described and evaluation metrics for multilabel classification of urban sound tags are proposed and the results of a simple baseline model that exploits spatiotemporal information are reported.
INCORPORATING AUXILIARY DATA FOR URBAN SOUND TAGGING Technical Report
TLDR
A feature vector is constructed based on the spatiotemporal metadata and used in parallel with log-mel spectrogram features to facilitate sound tagging and the presence of multiple annotations per recording is addressed by using a pseudo-labelling technique.
Multimodal Urban Sound Tagging with Spatiotemporal Context
TLDR
A multimodal UST system that deeply mines the audio and spatiotemporal context together, and a data filtering approach is adopted in text processing to further improve the performance of multimodality.
Cluster Analysis of Urban Acoustic Environments on Barcelona Sensor Network Data
TLDR
Unsupervised learning techniques are applied to discover different behavior patterns, both time and space, of sound pressure levels captured by acoustic sensors and to cluster them allowing the identification of various urban acoustic environments, based on a clustering algorithm using yearly acoustic indexes.
WASN-Based Day–Night Characterization of Urban Anomalous Noise Events in Narrow and Wide Streets
TLDR
Results confirm the unbalanced nature of the problem (RTN represents 83.5% of the data), while identifying 26 ANE subcategories mainly derived from pedestrians, animals, transports and industry, which becomes especially relevant for the WASN-based computation of equivalent RTN levels.
Polyphonic training set synthesis improves self-supervised urban sound classification.
TLDR
A two-stage approach to pre-train audio classifiers on a task whose ground truth is trivially available to benefit overall performance more than self-supervised learning and the geographical origin of the acoustic events in training set synthesis appears to have a decisive impact.
Convolutional Neural Networks Based System for Urban Sound Tagging with Spatiotemporal Context
TLDR
This paper proposes convolutional neural networks (CNNs) based system for UST with spatiotemporal context, and shows that the proposed system significantly outperform the baseline system on the evaluation metrics.
Low-Cost Distributed Acoustic Sensor Network for Real-Time Urban Sound Monitoring
TLDR
A highly scalable low-cost distributed infrastructure that features a ubiquitous acoustic sensor network to monitor urban sounds and enables practitioners to acoustically populate urban spaces and provide a reliable view of noises occurring in real time is presented.
Voice Anonymization in Urban Sound Recordings
TLDR
A method to anonymize and blur the voices of people recorded in public spaces–a novel, yet increasingly important task as acoustic sensing becomes ubiquitous in sensor-equipped smart cities.
Weakly Supervised Source-Specific Sound Level Estimation in Noisy Soundscapes
TLDR
The proposed weakly supervised source separation offer a means of leveraging clip-level source annotations to train source separation models are augmented with modified loss functions to bridge the gap between source separation and SSSLE and to address the presence of background.
...
1
2
3
...

References

SHOWING 1-10 OF 21 REFERENCES
A Dataset and Taxonomy for Urban Sound Research
TLDR
A taxonomy of urban sounds and a new dataset, UrbanSound, containing 27 hours of audio with 18.5 hours of annotated sound event occurrences across 10 sound classes are presented.
TUT database for acoustic scene classification and sound event detection
TLDR
The recording and annotation procedure, the database content, a recommended cross-validation setup and performance of supervised acoustic scene classification system and event detection baseline system using mel frequency cepstral coefficients and Gaussian mixture models are presented.
Scaper: A library for soundscape synthesis and augmentation
TLDR
Given a collection of iso-lated sound events, Scaper acts as a high-level sequencer that can generate multiple soundscapes from a single, probabilistically defined, “specification”, to increase the variability of the output.
Crowdsourcing Multi-label Audio Annotation Tasks with Citizen Scientists
TLDR
This paper describes the data collection on the Zooniverse citizen science platform, comparing the efficiencies of different audio annotation strategies, and compared multiple-pass binary annotation, single-pass multi-label annotation, and a hybrid approach: hierarchical multi- pass multi- label annotation.
An Open Dataset for Research on Audio Field Recording Archives: freefield1010
TLDR
A free and open dataset of 7690 audio clips sampled from the field-recording tag in the Freesound audio archive is introduced, describing the data preparation process, characterise the dataset descriptively, and illustrate its use through an auto-tagging experiment.
Seeing Sound
TLDR
Results show that more complex audio scenes result in lower annotator agreement, and spectrogram visualizations are superior in producing higher quality annotations at lower cost of time and human labor.
The Life of a New York City Noise Sensor Network
TLDR
The entire network infrastructure is outlined, including the operation of the sensors, followed by an analysis of its data yield and the development of the fault detection approach and the future system integration plans for this.
CNN architectures for large-scale audio classification
TLDR
This work uses various CNN architectures to classify the soundtracks of a dataset of 70M training videos with 30,871 video-level labels, and investigates varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on the authors' audio classification task, and larger training and label sets help up to a point.
TUT Sound events 2017, Development dataset
TUT Sound events 2017, development dataset consists of 24 audio recordings from a single acoustic scene: Street (outdoor), totaling 1:32:08
Freesound technical demo
TLDR
This demo wants to introduce Freesound to the multimedia community and show its potential as a research resource.
...
1
2
3
...