• Corpus ID: 14419924

Learning Sparse Feature Representations for Music Annotation and Retrieval

@inproceedings{Nam2012LearningSF,
  title={Learning Sparse Feature Representations for Music Annotation and Retrieval},
  author={Juhan Nam and Jorge Herrera and Malcolm Slaney and Julius Orion Smith},
  booktitle={ISMIR},
  year={2012}
}
We present a data-processing pipeline based on sparse feature learning and describe its applications to music annotation and retrieval. Content-based music annotation and retrieval systems process audio starting with features. While commonly used features, such as MFCC, are handcrafted to extract characteristics of the audio in a succinct way, there is increasing interest in learning features automatically from data using unsupervised algorithms. We describe a systemic approach applying feature… 

Figures and Tables from this paper

AUDIO CLASSIFICATION USING HIGH-DIMENSIONAL REPRESENTATIONS LEARNED ON STANDARD AUDIO FEATURES
TLDR
This work combines popularly used standard audio features and feed them into a learning algorithm in order to find a hidden and abstract high-dimensional representation of feature representations.
MIREX 2012 SUBMISSION AUDIO CLASSIFICATION USING SPARSE FEATURE LEARNING
TLDR
This work applies sparse Restricted Boltzmann Machine to audio data, particularly focusing on learning high-dimensional sparse feature representation, and evaluation results show that the learned feature representations achieve high accuracy.
Representation Learning of Music Using Artist Labels
TLDR
This paper presents a feature learning approach that utilizes artist labels attached in every single music track as an objective meta data and trains a deep convolutional neural network to classify audio tracks into a large number of artists.
AN AUDIO AND MUSIC SIMILARITY AND RETRIEVAL SYSTEM BASED ON SPARSE FEATURE REPRESENTATIONS
TLDR
An audio and music similarity and retrieval (AMSR) system, which employed sparse feature representations and employed locality sensitive hashing (LSH) method, which facilitates approximate nearest neighbor (ANN) search.
Music Annotation and Retrieval using Unlabeled Exemplars: Correlation and Sparse Codes
TLDR
Two exemplar-based approaches that represent the content of a music clip by referring to a large set of unlabeled audio exemplars by exploring the commonality of music signals to find out tag-specific acoustic patterns, without domain knowledge and feature design are presented.
Towards real-time music auto-tagging using sparse features
  • Yi-Hsuan Yang
  • Computer Science
    2013 IEEE International Conference on Multimedia and Expo (ICME)
  • 2013
TLDR
This paper investigates techniques to accelerate sparse feature extraction and music classification using support vector machines with linear or non-linear kernel functions, and compares state-of-the-art, dense audio features with sparse features computed using 1) sparse coding with a random dictionary, 2) randomized clustering forest, and 3) an extension of randomized clustered forest to temporal signals.
MIREX 2012 SUBMISSION AUDIO CLASSIFICATION USING SPARSE FEATURE LEARNING
TLDR
This work applies sparse Restricted Boltzmann Machine to audio data, particularly focusing on learning high-dimensional sparse feature representation, to achieve high accuracy on two music genre datasets.
Acoustic scene classification using sparse feature learning and event-based pooling
TLDR
The results show that learned features outperform MFCCs, event-based pooling achieves higher accuracy than uniform pooling and, furthermore, a combination of the two methods performs even better than either one used alone.
Feature Preprocessing with RBMs for Music Similarity Learning
TLDR
This study tests feature preprocessing with Restricted Boltzmann Machines in combination with established methods for learning distance measures and shows that this preprocessing improves the overall generalisation results of the trained models.
Towards a more efficient sparse coding based audio-word feature extraction system
TLDR
The concept of early and late temporal pooling is defined and added to the classic sparse coding based audio-word feature extraction pipeline, and they are tested on the genre tags subset of the CAL10k data set.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 22 REFERENCES
Unsupervised Learning of Sparse Features for Scalable Audio Classification
TLDR
A system to automatically learn features from audio in an unsupervised manner using an overcomplete dictionary which can be used to sparsely decompose log-scaled spectrograms and an efficient encoder which quickly maps new inputs to approximations of their sparse representations using the learned dictionary.
Semantic Annotation and Retrieval of Music and Sound Effects
We present a computer audition system that can both annotate novel audio tracks with semantically meaningful words and retrieve relevant tracks from a database of unlabeled audio content given a
Music Similarity Estimation with the Mean-Covariance Restricted Boltzmann Machine
TLDR
In k-NN based genre retrieval experiments on three datasets, the mean-covariance Restricted Boltzmann Machine clearly outperforms MFCC-based methods, beats simple unsupervised feature extraction using k-Means and comes close to the state-of-the-art.
Temporal Pooling and Multiscale Learning for Automatic Annotation and Ranking of Music Audio
TLDR
This paper analyzes some of the challenges in performing automatic annotation and ranking of music audio, and proposes a few improvements, including the use of principal component analysis on the mel-scaled spectrum and the idea of multiscale learning.
Audio-based Music Classification with a Pretrained Convolutional Network
TLDR
A convolutional network is built that is then trained to perform artist recognition, genre recognition and key detection, and it is found that the Convolutional approach improves accuracy for the genre Recognition and artist recognition tasks.
Towards musical query-by-semantic-description using the CAL500 data set
TLDR
Qualitative and quantitative results demonstrate that the supervised multi-class labeling (SML) model can both annotate a novel song with meaningful words and retrieve relevant songs given a multi-word, text-based query.
Time Series Models for Semantic Music Annotation
TLDR
A novel approach to automatic music annotation and retrieval that captures temporal aspects as well as timbral content, and a novel, efficient, and hierarchical expectation-maximization algorithm for DTM (HEM-DTM) is used to summarize the common information shared by DTMs modeling individual songs associated with a tag.
Semantic Annotation and Retrieval of Music using a Bag of Systems Representation
We present a content-based auto-tagger that leverages a rich dictionary of musical codewords, where each codeword is a generative model that captures timbral and temporal characteristics of music.
A Classification-Based Polyphonic Piano Transcription Approach Using Learned Feature Representations
TLDR
This paper applies deep belief networks to musical data and evaluates the learned feature representations on classification-based polyphonic piano transcription and suggests a way of training classifiers jointly for multiple notes to improve training speed and classification performance.
Unsupervised feature learning for audio classification using convolutional deep belief networks
In recent years, deep learning approaches have gained significant interest as a way of building hierarchical representations from unlabeled data. However, to our knowledge, these deep learning
...
1
2
3
...