Open-Source Practices for Music Signal Processing Research: Recommendations for Transparent, Sustainable, and Reproducible Audio Research

@article{McFee2019OpenSourcePF,
  title={Open-Source Practices for Music Signal Processing Research: Recommendations for Transparent, Sustainable, and Reproducible Audio Research},
  author={Brian McFee and Jong Wook Kim and M. Cartwright and Justin Salamon and Rachel M. Bittner and Juan Pablo Bello},
  journal={IEEE Signal Processing Magazine},
  year={2019},
  volume={36},
  pages={128-137}
}
In the early years of music information retrieval (MIR), research problems were often centered around conceptually simple tasks, and methods were evaluated on small, idealized data sets. A canonical example of this is genre recognition-i.e., Which one of n genres describes this song?-which was often evaluated on the GTZAN data set (1,000 musical excerpts balanced across ten genres) [1]. As task definitions were simple, so too were signal analysis pipelines, which often derived from methods for… 

Figures from this paper

Computational Methods for Melody and Voice Processing in Music Recordings (Dagstuhl Seminar 19052)
TLDR
Current challenges in academic and industrial research in view of the recent advances in deep learning and data-driven models are discussed and novel applications of these technologies in music and multimedia retrieval, content creation, musicology, education, and human-computer interaction are explored.
Open-Unmix - A Reference Implementation for Music Source Separation
TLDR
Open-Unmix provides implementations for the most popular deep learning frameworks, giving researchers a flexible way to reproduce results and provides a pre-trained model for end users and even artists to try and use source separation.
An Educational Guide through the FMP Notebooks for Teaching and Learning Fundamentals of Music Processing
This paper provides a guide through the FMP notebooks, a comprehensive collection of educational material for teaching and learning fundamentals of music processing (FMP) with a particular focus on
Codified audio language modeling learns useful representations for music information retrieval
TLDR
The strength of Jukebox’s representations are interpreted as evidence that modeling audio instead of tags provides richer representations for MIR.
Exploring Quality and Generalizability in Parameterized Neural Audio Effects
TLDR
It was found that limiting the audio content of the dataset, for example using datasets of just a single instrument, provided a significant improvement in model accuracy over models trained on more general datasets.
Filosax: A Dataset of Annotated Jazz Saxophone Recordings
TLDR
The criteria used for choosing and sourcing the repertoire, the recording process and the semi-automatic transcription pipeline are outlined, and the use of the dataset to analyse musical phenomena such as swing timing and dynamics of typical musical figures is demonstrated.
Feature Extraction of Music Signal Based on Adaptive Wave Equation Inversion
The digitization, analysis, and processing technology of music signals are the core of digital music technology. There is generally a preprocessing process before the music signal processing. The
USING THE SYNC TOOLBOX FOR AN EXPERIMENT ON HIGH-RESOLUTION MUSIC ALIGNMENT
TLDR
This work combines spectral flux used as onset features with conventional chroma features to increase the alignment accuracy and conducts some experiments within the Sync Toolbox framework to show that this approach preserves the accuracy compared with another high-resolution approach while being computationally simpler.
Can't trust the feeling? How open data reveals unexpected behavior of high-level music descriptors
TLDR
High-level classifierbased music descriptor output in AcousticBrainz is analyzed, indicating that the descriptor values should not be taken as absolute truth, and hinting at directions for more comprehensive descriptor testing that are overlooked in common machine learning evaluation and quality assurance setups.
FSD50K: An Open Dataset of Human-Labeled Sound Events
TLDR
FSD50K is introduced, an open dataset containing over 51 k audio clips totalling over 100 h of audio manually labeled using 200 classes drawn from the AudioSet Ontology, to provide an alternative benchmark dataset and thus foster SER research.
...
1
2
3
...

References

SHOWING 1-10 OF 72 REFERENCES
madmom: A New Python Audio and Music Signal Processing Library
TLDR
Madmom is an open-source audio processing and music information retrieval (MIR) library written in Python that features a concise, NumPy-compatible, object oriented design with simple calling conventions and sensible default values for all parameters that facilitates fast prototyping of MIR applications.
Musical genre classification of audio signals
TLDR
The automatic classification of audio signals into an hierarchy of musical genres is explored and three feature sets for representing timbral texture, rhythmic content and pitch content are proposed.
What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Features?
TLDR
This work analyzes the robustness of MFCCs and chroma features to sampling rate, codec, bitrate, frame size and music genre, and estimates the practical effects for a sample task like genre classification.
Computer-aided Melody Note Transcription Using the Tony Software: Accuracy and Efficiency
TLDR
Tony, a software tool for the interactive annotation of melodies from monophonic audio recordings, is presented, and it is shown that Tony’s built in automatic note transcription method compares favourably with existing tools.
Scaper: A library for soundscape synthesis and augmentation
TLDR
Given a collection of iso-lated sound events, Scaper acts as a high-level sequencer that can generate multiple soundscapes from a single, probabilistically defined, “specification”, to increase the variability of the output.
The Audio Degradation Toolbox and Its Application to Robustness Evaluation
TLDR
It is demonstrated that specific degradations can reduce or even reverse the performance difference between two competing methods, and it is shown that performance strongly depends on the combination of method and degradation applied.
MARSYAS: a framework for audio analysis
TLDR
This paper describes MARSYAS, a framework for experimenting, evaluating and integrating techniques for audio content analysis in restricted domains and a new method for temporal segmentation based on audio texture that is combined with audio analysis techniques and used for hierarchical browsing, classification and annotation of audio files.
librosa: Audio and Music Signal Analysis in Python
TLDR
A brief overview of the librosa library's functionality is provided, along with explanations of the design goals, software development practices, and notational conventions.
A Software Framework for Musical Data Augmentation
TLDR
This work develops a general software framework for augmenting annotated musical datasets, which will allow practitioners to easily expand training sets with musically motivated perturbations of both audio and annotations.
MIR_EVAL: A Transparent Implementation of Common MIR Metrics
Central to the field of MIR research is the evaluation of algorithms used to extract information from music data. We present mir_eval, an open source software library which provides a transparent and
...
1
2
3
4
5
...