Share This Author
librosa: Audio and Music Signal Analysis in Python
A brief overview of the librosa library's functionality is provided, along with explanations of the design goals, software development practices, and notational conventions.
Metric Learning to Rank
A general metric learning algorithm is presented, based on the structural SVM framework, to learn a metric such that rankings of data induced by distance from a query can be optimized against various ranking measures, such as AUC, Precision-at-k, MRR, MAP or NDCG.
Lasagne: First release.
Deep Salience Representations for F0 Estimation in Polyphonic Music
A fully convolutional neural network for learning salience representations for estimating fundamental frequencies, trained using a large, semi-automatically generated f0 dataset is described and shown to achieve state-of-the-art performance on several multi-f0 and melody datasets.
The Natural Language of Playlists
A simple, scalable, and objective evaluation procedure for playlist generation algorithms is proposed and an efficient algorithm is developed to learn an optimal combination of simple playlist algorithms.
MIR_EVAL: A Transparent Implementation of Common MIR Metrics
Central to the field of MIR research is the evaluation of algorithms used to extract information from music data. We present mir_eval, an open source software library which provides a transparent and…
Learning Multi-modal Similarity
To cope with the ubiquitous problems of subjectivity and inconsistency in multi-media similarity, this work develops graph-based techniques to filter similarity measurements, resulting in a simplified and robust training procedure.
Robust Structural Metric Learning
This paper presents an efficient and robust structural metric learning algorithm which enforces group sparsity on the learned transformation, while optimizing for structured ranking output prediction.
Adaptive Pooling Operators for Weakly Labeled Sound Event Detection
- Brian McFee, J. Salamon, J. Bello
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and…
- 26 April 2018
This paper treats SED as a multiple instance learning (MIL) problem, where training labels are static over a short excerpt, indicating the presence or absence of sound sources but not their temporal locality, and develops a family of adaptive pooling operators—referred to as autopool—which smoothly interpolate between common pooling Operators, and automatically adapt to the characteristics of the sound sources in question.
Per-Channel Energy Normalization: Why and How
This letter investigates the adequacy of PCEN for spectrogram-based pattern recognition in far-field noisy recordings, both from theoretical and practical standpoints and describes the asymptotic regimes in PCEN: temporal integration, gain control, and dynamic range compression.