• Publications
  • Influence
The Million Song Dataset
We introduce the Million Song Dataset, a freely-available collection of audio features and metadata for a million contemporary popular music tracks. Expand
  • 946
  • 165
  • PDF
Audio Set: An ontology and human-labeled dataset for audio events
We present Audio Set, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research. Expand
  • 793
  • 163
  • PDF
CNN architectures for large-scale audio classification
We use various CNN architectures to classify soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels. Expand
  • 837
  • 119
  • PDF
The ICSI Meeting Corpus
We have collected a corpus of data from natural meetings that occurred at the International Computer Science Institute in Berkeley, California over the last three years. Expand
  • 652
  • 60
  • PDF
Tandem connectionist feature extraction for conventional HMM systems
We show a large improvement in word recognition performance by combining neural-net discriminative feature processing with Gaussian-mixture distribution modeling. Expand
  • 791
  • 59
  • PDF
Consumer video understanding: a benchmark database and an evaluation of human and machine performance
We introduce a new consumer video database called CCV, containing 9,317 web videos over 20 semantic categories, including events like "baseball" and "parade", scenes like "beach", and objects like "cat". Expand
  • 260
  • 58
  • PDF
librosa: Audio and Music Signal Analysis in Python
This document describes version 0.4.0 of librosa: a Python pack- age for audio and music signal processing. Expand
  • 698
  • 41
  • PDF
Prediction-driven computational auditory scene analysis
  • D. Ellis
  • Psychology, Computer Science
  • 1996
The sound of a busy environment, such as a city street, gives rise to a perception of numerous distinct events in a human listener--the 'auditory scene analysis' of the acoustic information. Expand
  • 413
  • 40
  • PDF
A Discriminative Model for Polyphonic Piano Transcription
We present a discriminative model for polyphonic piano transcription. Support vector machines trained on spectral features are used to classify frame-level note instances. The classifier outputs areExpand
  • 252
  • 37
  • PDF
Model-Based Expectation-Maximization Source Separation and Localization
This paper describes a system, referred to as model-based expectation-maximization source separation and localization (MESSL), for separating and localizing multiple sound sources from an underdetermined reverberant two-channel recording. Expand
  • 286
  • 34
  • PDF