• Corpus ID: 3892073

Audio-based Distributional Semantic Models for Music Auto-tagging and Similarity Measurement

  title={Audio-based Distributional Semantic Models for Music Auto-tagging and Similarity Measurement},
  author={Giannis Karamanolakis and Elias Iosif and Athanasia Zlatintsi and Aggelos Pikrakis and Alexandros Potamianos},
The recent development of Audio-based Distributional Semantic Models (ADSMs) enables the computation of audio and lexical vector representations in a joint acoustic-semantic space. In this work, these joint representations are applied to the problem of automatic tag generation. The predicted tags together with their corresponding acoustic representation are exploited for the construction of acoustic-semantic clip embeddings. The proposed algorithms are evaluated on the task of similarity… 

Figures and Tables from this paper

Sensory-Aware Multimodal Fusion for Word Semantic Similarity Estimation
This work estimates multimodal word representations via the fusion of auditory and visual modalities with the text modality through middle and late fusion of representations with modality weights assigned to each of the unimodal representations.


Audio-Based Distributional Representations of Meaning Using a Fusion of Feature Encodings
This work constructs an ADSM model in order to compute the distance between words (lexical semantic similarity task) and is shown to significantly outperform the state-of-the-art results reported in the literature.
Sound-based distributional models
The first results of the efforts to build a perceptually grounded semantic model based on sound data collected from freesound.org show that the models are able to capture semantic relatedness, with the tag- based model scoring higher than the sound-based model and the combined model.
Semantic Annotation and Retrieval of Music and Sound Effects
We present a computer audition system that can both annotate novel audio tracks with semantically meaningful words and retrieve relevant tracks from a database of unlabeled audio content given a
Music Information Retrieval Using Social Tags and Audio
In this paper we describe a novel approach to applying text-based information retrieval techniques to music collections. We represent tracks with a joint vocabulary consisting of both conventional
Using Artist Similarity to Propagate Semantic Information
Four approaches for computing artists similarity based on different sources of music information (user preference data, social tags, web documents, and audio content) are compared in terms of their ability to accurately propagate three different types of tags.
Contextual tag inference
It is shown that users agree more on tags applied to clips temporally “closer” to one another; that conditional restricted Boltzmann machine models of tags can more accurately predict related tags when they take context into account; and that when training data is “smoothed” using context, support vector machines can better rank these clips according to the original, unsmoothed tags.
Toward Evaluation Techniques for Music Similarity
A database, methodology and ground truth for the evaluation of automatic techniques for music similarity and a technique of sharing acoustic features rather than raw audio to avoid copyright problems are described.
Social Tagging and Music Information Retrieval
The state of the art in commercial and research social tagging systems for music is described, how tags are collected and used in current systems are described, and some of the issues that are encountered when using tags are explored.
Deep content-based music recommendation
This paper proposes to use a latent factor model for recommendation, and predict the latent factors from music audio when they cannot be obtained from usage data, and shows that recent advances in deep learning translate very well to the music recommendation setting, with deep convolutional neural networks significantly outperforming the traditional approach.
Exploiting online music tags for music emotion classification
A novel data-sampling method that eliminates the imbalance of the online tags but still takes the prior probability of each emotion class into account, and a two-layer emotion classification structure is proposed to harness the genre information available in the online repository of music tags.