• Corpus ID: 16214271

Autotagging music with conditional restricted Boltzmann machines

  title={Autotagging music with conditional restricted Boltzmann machines},
  author={Michael I. Mandel and Razvan Pascanu and H. Larochelle and Yoshua Bengio},
This paper describes two applications of conditional restricted Boltzmann machines (CRBMs) to the task of autotagging music. The first consists of training a CRBM to predict tags that a user would apply to a clip of a song based on tags already applied by other users. By learning the relationships between tags, this model is able to pre-process training data to significantly improve the performance of a support vector machine (SVM) autotagging. The second is the use of a discriminative RBM, a… 

Figures and Tables from this paper

Music autotagging as captioning
This work proposes formulating music autotagging as a captioning task, which automatically associates tags with a clip of music in the order a human would apply them, and conducts experiments on data collected from the MajorMiner game.
Contextual tag inference
It is shown that users agree more on tags applied to clips temporally “closer” to one another; that conditional restricted Boltzmann machine models of tags can more accurately predict related tags when they take context into account; and that when training data is “smoothed” using context, support vector machines can better rank these clips according to the original, unsmoothed tags.
Conditional Restricted Boltzmann Machines for Structured Output Prediction
This work argues that standard Contrastive Divergence-based learning may not be suitable for training CRBMs, and proposes an improved learning algorithm for two distinct types of structured output prediction problems and shows that the new learning algorithms can work much better than Contrastives Divergence on both types of problems.
Learning Algorithms for the Classification Restricted Boltzmann Machine
It is argued that RBMs can provide a self-contained framework for developing competitive classifiers and it is shown that competitive classification performances can be reached when appropriately combining discriminative and generative training objectives.
Ensemble of machine learning algorithms for cognitive and physical speaker load detection
Seven classification models contributing to the final prediction of certain type of load of a speaker using acoustic features are presented, namely, neural network with rectified linear unit and dropout, conditional restricted Boltzmann machine, logistic regression, support vector machine, Gaussian discriminant analysis, and random forest.
Retrieval and annotation of music using latent semantic models
A joint aspect model is developed that can learn from both tagged and untagged tracks by indexing both conventional words and muswords and is used as the basis of a music search system that supports query by example and by keyword, and of a simple probabilistic machine annotation system.
Neural Conditional Energy Models for Multi-label Classification
This paper presents a powerful model called a Neural Conditional Energy Model (NCEM) to solve MLC, a hybrid deterministic-stochastic network of which a deterministic neural network is used to transform the input data, before contributing to the energy landscape of v, y, and a single stochastic hidden layer h.
Belief Propagation in Conditional RBMs for Structured Prediction
It is demonstrated that, in both maximum likelihood and max-margin learning, training conditional RBMs with BP as the inference routine can provide significantly better results than current state-of-the-art CD methods on structured prediction problems.
Acoustic emotion recognition based on fusion of multiple feature-dependent deep Boltzmann machines
The results show that the proposed method can improve the performance of classification of four dimensions and is suitable for classification of unbalanced data sets.
Learning Contextualized Semantics from Co-occurring Terms via a Siamese Architecture


Autotagger: A Model for Predicting Social Tags from Acoustic Features on Large Music Databases
This paper uses a set of 360 classifiers trained using the online ensemble learning algorithm FilterBoost to map audio features onto social tags collected from the Web, allowing for insertion of previously unheard music into a social recommender system.
Exploring automatic music annotation with "acoustically-objective" tags
This work uses the Swat10k data set, which consists of 10,870 songs annotated using a vocabulary of 475 acoustic tags and 153 genre tags from Pandora's Music Genome Project, to develop an autotagging system and evaluates two new sets of content-based audio features obtained using the publicly-available Echo Nest API.
Multiple-Instance Learning for Music Information Retrieval
It is found that mi-SVM is better than a control at the recovery task on training clips, with an average classification accuracy as high as 87% over 43 tags; on test clips, it is comparable to the control with anAverage classification accuracy of up to 68%.
Learning Tags that Vary Within a Song
It is found that the agreement between different people’s tags decreases as the distance between the parts of a song that they heard increases, and a conditional restricted Boltzmann machine is described to model this relationship.
Classification using discriminative restricted Boltzmann machines
This paper presents an evaluation of different learning algorithms for RBMs which aim at introducing a discriminative component to RBM training and improve their performance as classifiers, and demonstrates how discriminating RBMs can also be successfully employed in a semi-supervised setting.
Multi-Label Classification of Music into Emotions
In this paper, the automated detection of emotion in music is modeled as a multilabel classification task, where a piece of music may belong to more than one class. Four algorithms are evaluated and
Restricted Boltzmann machines for collaborative filtering
This paper shows how a class of two-layer undirected graphical models, called Restricted Boltzmann Machines (RBM's), can be used to model tabular data, such as user's ratings of movies, and demonstrates that RBM's can be successfully applied to the Netflix data set.
A Web-Based Game for Collecting Music Metadata
This work measured the degree to which binary classifiers could be trained to spot popular tags and compared the performance of clip classifiers trained with MajorMiner's tag data to those trained with social tag data from a popular website.
Exponential Family Harmoniums with an Application to Information Retrieval
An alternative two-layer model based on exponential family distributions and the semantics of undirected models is proposed, which performs well on document retrieval tasks and provides an elegant solution to searching with keywords.
Social Tagging and Music Information Retrieval
The state of the art in commercial and research social tagging systems for music is described, how tags are collected and used in current systems are described, and some of the issues that are encountered when using tags are explored.