Scaper: A library for soundscape synthesis and augmentation

  title={Scaper: A library for soundscape synthesis and augmentation},
  author={Justin Salamon and Duncan MacConnell and M. Cartwright and Peter Qi Li and Juan Pablo Bello},
  journal={2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},
  • J. Salamon, D. MacConnell, J. Bello
  • Published 1 October 2017
  • Computer Science
  • 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
Sound event detection (SED) in environmental recordings is a key topic of research in machine listening, with applications in noise monitoring for smart cities, self-driving cars, surveillance, bioa-coustic monitoring, and indexing of large multimedia collections. Developing new solutions for SED often relies on the availability of strongly labeled audio recordings, where the annotation includes the onset, offset and source of every event. Generating such precise annotations manually is very… 

Figures from this paper

Adaptive Pooling Operators for Weakly Labeled Sound Event Detection
This paper treats SED as a multiple instance learning (MIL) problem, where training labels are static over a short excerpt, indicating the presence or absence of sound sources but not their temporal locality, and develops a family of adaptive pooling operators—referred to as autopool—which smoothly interpolate between common pooling Operators, and automatically adapt to the characteristics of the sound sources in question.
Weakly Supervised Source-Specific Sound Level Estimation in Noisy Soundscapes
The proposed weakly supervised source separation offer a means of leveraging clip-level source annotations to train source separation models are augmented with modified loss functions to bridge the gap between source separation and SSSLE and to address the presence of background.
An analysis of Sound Event Detection under acoustic degradation using multi-resolution systems
This paper analyzes the performance of Sound Event Detection systems under diverse artificial acoustic conditions such as high- or low-pass filtering and clipping or dynamic range compression, as well as under an scenario of high overlap between events.
Comparative Assessment of Data Augmentation for Semi-Supervised Polyphonic Sound Event Detection
This work proposes a CRNN system exploiting unlabeled data with semi-supervised learning based on the “Mean teacher” method, in combination with data augmentation to overcome the limited size of the training dataset and to further improve the performances.
Sound Event Detection and Separation: A Benchmark on Desed Synthetic Soundscapes
  • Nicolas Turpault, R. Serizel, J. Salamon
  • Physics, Computer Science
    ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2021
It is shown that temporal localization of sound events remains a challenge for SED systems and that reverberation and non-target sound events severely degrade system performance.
Sound Event Detection Using Point-Labeled Data
  • B. Kim, B. Pardo
  • Computer Science
    2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
  • 2019
This work illustrates methods to train a SED model on point-labeled data and shows that a model trained on point labeled audio data significantly outperforms weak models and is comparable to a modeltrained on strongly labeled data.
A Strongly-Labelled Polyphonic Dataset of Urban Sounds with Spatiotemporal Context
An accompanying hierarchical label taxonomy is introduced for SINGA: PURA, a strongly labelled polyphonic urban sound dataset with spatiotemporal context designed to be compatible with other existing datasets for urban sound tagging while also able to capture sound events unique to the Singaporean context.
Conditional Sound Generation Using Neural Discrete Time-Frequency Representation Learning
This work proposes a method for generating sounds via neural discrete time-frequency representation learning, conditioned on sound classes, which offers an advantage in efficiently modelling long-range dependencies and retaining local fine-grained structures within sound clips.
Sound Event Detection in Domestic Environments with Weakly Labeled Data and Soundscape Synthesis
The paper introduces Domestic Environment Sound Event Detection (DESED) dataset mixing a part of last year dataset and an additional synthetic, strongly labeled, dataset provided this year that’s described more in detail.
Polyphonic training set synthesis improves self-supervised urban sound classification.
A two-stage approach to pre-train audio classifiers on a task whose ground truth is trivially available to benefit overall performance more than self-supervised learning and the geographical origin of the acoustic events in training set synthesis appears to have a decisive impact.


Weakly-supervised audio event detection using event-specific Gaussian filters and fully convolutional networks
A model based on convolutional neural networks that relies only on weakly-supervised data for training and is able to detect frame-level information, e.g., the temporal position of sounds, even when it is trained merely with clip-level labels.
Audio analysis for surveillance applications
The proposed hybrid solution is capable of detecting new kinds of suspicious audio events that occur as outliers against a background of usual activity and adaptively learns a Gaussian mixture model to model the background sounds and updates the model incrementally as new audio data arrives.
A Morphological Model for Simulating Acoustic Scenes and Its Application to Sound Event Detection
This paper introduces a model for simulating environmental acoustic scenes that abstracts temporal structures from audio recordings. This model allows us to explicitly control key morphological
Metrics for Polyphonic Sound Event Detection
This paper presents and discusses various metrics proposed for evaluation of polyphonic sound event detection systems used in realistic situations where there are typically multiple sound sources
ESSENTIA: an open-source library for sound and music analysis
Essentia 2.0, an open-source C++ library for audio analysis and audio-based music information retrieval, is presented, which contains an extensive collection of reusable algorithms which implement audio input/output functionality, standard digital signal processing blocks, statistical characterization of data, and a large set of spectral, temporal, tonal and high-level music descriptors.
A Software Framework for Musical Data Augmentation
This work develops a general software framework for augmenting annotated musical datasets, which will allow practitioners to easily expand training sets with musically motivated perturbations of both audio and annotations.
A 3-D Immersive Synthesizer for Environmental Sounds
The design of a 3-D immersive synthesizer dedicated to environmental sounds intended to be used in the framework of interactive virtual reality applications and an original approach exploiting the synthesis capabilities for simulating the spatial extension of sound sources is presented.
PySOX: Leveraging the Audio Signal Processing Power of SOX in Python
SoX is a popular command line tool for sound processing. Among many other processes, it allows users to perform a repeated process (e.g. file conversion) over a large batch of audio files and apply a
The Implementation of Low-cost Urban Acoustic Monitoring Devices
State of the Art in Sound Texture Synthesis
An overview is given over analysis methods used for sound texture synthesis, such as segmentation, statistical modeling, timbral analysis, and modeling of transitions.