• Publications
  • Influence
A Dataset and Taxonomy for Urban Sound Research
A taxonomy of urban sounds and a new dataset, UrbanSound, containing 27 hours of audio with 18.5 hours of annotated sound event occurrences across 10 sound classes are presented.
Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification
It is shown that the improved performance stems from the combination of a deep, high-capacity model and an augmented training set: this combination outperforms both the proposed CNN without augmentation and a “shallow” dictionary learning model with augmentation.
Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics
  • J. Salamon, E. Gómez
  • Computer Science
    IEEE Transactions on Audio, Speech, and Language…
  • 1 August 2012
A comparative evaluation of the proposed approach shows that it outperforms current state-of-the-art melody extraction systems in terms of overall accuracy.
MedleyDB: A Multitrack Dataset for Annotation-Intensive MIR Research
The dataset MedleyDB, a dataset of annotated, royaltyfree multitrack recordings, is shown to be considerably more challenging than the current test sets used in the MIREX evaluation campaign, thus opening new research avenues in melody extraction research.
Crepe: A Convolutional Representation for Pitch Estimation
This paper proposes a data-driven pitch tracking algorithm, CREPE, which is based on a deep convolutional neural network that operates directly on the time-domain waveform, and evaluates the model's generalizability in terms of noise robustness.
Scaper: A library for soundscape synthesis and augmentation
Given a collection of iso-lated sound events, Scaper acts as a high-level sequencer that can generate multiple soundscapes from a single, probabilistically defined, “specification”, to increase the variability of the output.
Deep Salience Representations for F0 Estimation in Polyphonic Music
A fully convolutional neural network for learning salience representations for estimating fundamental frequencies, trained using a large, semi-automatically generated f0 dataset is described and shown to achieve state-of-the-art performance on several multi-f0 and melody datasets.
Essentia: An Audio Analysis Library for Music Information Retrieval
Comunicacio presentada a la 14th International Society for Music Information Retrieval Conference, celebrada a Curitiba (Brasil) els dies 4 a 8 de novembre de 2013.
Sound Event Detection in Domestic Environments with Weakly Labeled Data and Soundscape Synthesis
The paper introduces Domestic Environment Sound Event Detection (DESED) dataset mixing a part of last year dataset and an additional synthetic, strongly labeled, dataset provided this year that’s described more in detail.
Computer-aided Melody Note Transcription Using the Tony Software: Accuracy and Efficiency
Tony, a software tool for the interactive annotation of melodies from monophonic audio recordings, is presented, and it is shown that Tony’s built in automatic note transcription method compares favourably with existing tools.