• Publications
  • Influence
FSD50K: An Open Dataset of Human-Labeled Sound Events
FSD50K is introduced, an open dataset containing over 51 k audio clips totalling over 100 h of audio manually labeled using 200 classes drawn from the AudioSet Ontology, to provide an alternative benchmark dataset and thus foster SER research.
A Wavenet for Speech Denoising
The proposed model adaptation retains Wavenet's powerful acoustic modeling capabilities, while significantly reducing its time-complexity by eliminating its autoregressive nature.
End-to-end Learning for Music Audio Tagging at Scale
This work focuses on studying how waveform-based models outperform spectrogram-based ones in large-scale data scenarios when datasets of variable size are available for training, suggesting that music domain assumptions are relevant when not enough training data are available.
Freesound Datasets: A Platform for the Creation of Open Audio Datasets
Comunicacio presentada al 18th International Society for Music Information Retrieval Conference celebrada a Suzhou, Xina, del 23 al 27 d'cotubre de 2017.
General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline
The goal of the task is to build an audio tagging system that can recognize the category of an audio clip from a subset of 41 diverse categories drawn from the AudioSet Ontology.
Experimenting with musically motivated convolutional neural networks
This article explores various architectural choices of relevance for music signals classification tasks in order to start understanding what the chosen networks are learning and proposes several musically motivated architectures.
musicnn: Pre-trained convolutional neural networks for music audio tagging
The musicnn library contains a set of pre-trained musically motivated convolutional neural networks for music audio tagging, which can be used as out-of-the-box music audio taggers, as music feature extractors, or as pre- trained models for transfer learning.
Timbre analysis of music audio signals with convolutional neural networks
One of the main goals of this work is to design efficient CNN architectures — what reduces the risk of these models to over-fit, since CNNs' number of parameters is minimized.
Designing efficient architectures for modeling temporal features with convolutional neural networks
  • Jordi Pons, X. Serra
  • Computer Science
    IEEE International Conference on Acoustics…
  • 5 March 2017
A novel design strategy is proposed that might promote more expressive and intuitive deep learning architectures by efficiently exploiting the representational capacity of the first layer - using different filter shapes adapted to fit musical concepts within the first layers.
On automatic drum transcription using non-negative matrix deconvolution and itakura saito divergence
New contributions for audio event detection methods using the Itakura Saito divergence are studied that improve efficiency and numerical stability, and simplify the generation of target pattern sets.