• Corpus ID: 64909709

PySOX: Leveraging the Audio Signal Processing Power of SOX in Python

@inproceedings{Bittner2016PySOXLT,
  title={PySOX: Leveraging the Audio Signal Processing Power of SOX in Python},
  author={Rachel M. Bittner and Eric J. Humphrey and Juan Pablo Bello},
  year={2016}
}
SoX is a popular command line tool for sound processing. Among many other processes, it allows users to perform a repeated process (e.g. file conversion) over a large batch of audio files and apply a chains of audio effects (e.g. compression, reverb) in a single line of code. SoX has proven to be a useful resource for Music Information Retrieval (MIR) tasks, and in particular for dataset creation. While the library is powerful and stable, building long strings of command line arguments can be… 
pyDAW: A Pragmatic CLI for Digital Audio Processing
Digital Audio Workstations (DAW) are tools for mastering and mixing audio files, in the broader context of large-scale audio processing. Among many other processes, they allow users to perform
Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset
TLDR
By using notes as an intermediate representation, a suite of models capable of transcribing, composing, and synthesizing audio waveforms with coherent musical structure on timescales spanning six orders of magnitude are trained, a process the authors call Wave2Midi2Wave.
DCASE-MODELS: A PYTHON LIBRARY FOR COMPUTATIONAL ENVIRONMENTAL SOUND ANALYSIS USING DEEP–LEARNING MODELS
TLDR
This document presents DCASE-models, an open–source Python library for rapid prototyping of environmental sound analysis systems, with an emphasis on deep–learning models, which includes a model interface to standardize the interaction of machine learning methods with the other system components.
ENABLING FACTORIZED PIANO MUSIC MODELING
TLDR
By using notes as an intermediate representation, a suite of models capable of transcribing, composing, and synthesizing audio waveforms with coherent musical structure on timescales spanning six orders of magnitude are trained, a process the authors call Wave2Midi2Wave.
N-HANS: A neural network-based toolkit for in-the-wild audio enhancement
TLDR
The Neuro-Holistic Audio-eNhancement System is presented, a Python toolkit for in-the-wild audio enhancement that includes functionalities for audio denoising, source separation, and —for the first time in such a toolkit—selective noise suppression.
Who Calls The Shots? Rethinking Few-Shot Learning for Audio
TLDR
A series of experiments lead to audio-specific insights on few-shot learning, some of which are at odds with recent findings in the image domain: there is no best one-size- fits-all model, method, and support set selection criterion, and it depends on the expected application scenario.
Leveraging Hierarchical Structures for Few-Shot Musical Instrument Recognition
TLDR
This work applies a hierarchical loss function to the training of prototypical networks and a method to aggregate prototypes hierarchically, mirroring the structure of a predefined musical instrument hierarchy, to enable classification of a wider set of musical instruments.
A Spell-checker Integrated Machine Learning Based Solution for Speech to Text Conversion
TLDR
This study proposes to build a speech to text conversion system for the Bengali language by creating a neural network to recognize the audio files containing speech and then, to transform the audio speech into its text format.
Multiple F0 Estimation in Vocal Ensembles using Convolutional Neural Networks
TLDR
These models outperform a state-of-the-art method intended for the same music genre when evaluated with an increased F0 resolution, as well as a general-purpose method for multi-F0 estimation.
Jamming with a Smart Mandolin and Freesound-based Accompaniment
TLDR
Two use cases investigating how audio content retrieved from Freesound can be leveraged by performers or audiences to produce accompanying soundtracks for music performance with a smart mandolin are presented.
...
...

References

SHOWING 1-8 OF 8 REFERENCES
A Software Framework for Musical Data Augmentation
TLDR
This work develops a general software framework for augmenting annotated musical datasets, which will allow practitioners to easily expand training sets with musically motivated perturbations of both audio and annotations.
MedleyDB: A Multitrack Dataset for Annotation-Intensive MIR Research
TLDR
The dataset MedleyDB, a dataset of annotated, royaltyfree multitrack recordings, is shown to be considerably more challenging than the current test sets used in the MIREX evaluation campaign, thus opening new research avenues in melody extraction research.
MedleyDB 2.0: New Data and a System for Sustainable Data Collection
TLDR
This work presents MedleyDB 2.0, the second iteration of a dataset of multitrack recordings created to support Music Information Retrieval (MIR) research, which has now grown to contain over 250 multitracks after the addition of 132 tracks in this release.
MIR_EVAL: A Transparent Implementation of Common MIR Metrics
Central to the field of MIR research is the evaluation of algorithms used to extract information from music data. We present mir_eval, an open source software library which provides a transparent and
A Dataset and Taxonomy for Urban Sound Research
TLDR
A taxonomy of urban sounds and a new dataset, UrbanSound, containing 27 hours of audio with 18.5 hours of annotated sound event occurrences across 10 sound classes are presented.
Essentia: An Audio Analysis Library for Music Information Retrieval
Comunicacio presentada a la 14th International Society for Music Information Retrieval Conference, celebrada a Curitiba (Brasil) els dies 4 a 8 de novembre de 2013.
librosa: Audio and Music Signal Analysis in Python
TLDR
A brief overview of the librosa library's functionality is provided, along with explanations of the design goals, software development practices, and notational conventions.
Intuitive analysis, creation and manipulation of midi data with pretty midi
  • In Proceedings of the 15th International Society for Music Information Retrieval Conference Late Breaking and Demo Papers,
  • 2014