Search Result Clustering in Collaborative Sound Collections

  title={Search Result Clustering in Collaborative Sound Collections},
  author={Xavier Favory and Frederic Font and Xavier Serra},
  journal={Proceedings of the 2020 International Conference on Multimedia Retrieval},
  • Xavier Favory, F. Font, X. Serra
  • Published 8 April 2020
  • Computer Science
  • Proceedings of the 2020 International Conference on Multimedia Retrieval
The large size of nowadays' online multimedia databases makes retrieving their content a difficult and time-consuming task. Users of online sound collections typically submit search queries that express a broad intent, often making the system return large and unmanageable result sets. Search Result Clustering is a technique that organises search-result content into coherent groups, which allows users to identify useful subsets in their results. Obtaining coherent and distinctive clusters that… 

Figures and Tables from this paper

FSD50K: An Open Dataset of Human-Labeled Sound Events
FSD50K is introduced, an open dataset containing over 51 k audio clips totalling over 100 h of audio manually labeled using 200 classes drawn from the AudioSet Ontology, to provide an alternative benchmark dataset and thus foster SER research.
Generating sound palettes for a Freesound concatenative synthesizer to support creativity
This thesis focuses on the specific case of a concatenative synthesizer connected to Freesound and evaluations show that the search method does not seem to have a significant influence on the CSI score and thus on the creative process, and there is also no clear favourite in the qualitative assessment.


Sound Sharing and Retrieval
This chapter describes how to build an audio database by outlining different aspects to be taken into account and discusses metadata-based descriptions of audio content and different searching and browsing techniques that can be used to navigate the database.
A survey of Web clustering engines
The issues that must be addressed in the development of a Web clustering engine, including acquisition and preprocessing of search results, their clustering and visualization are discussed, and the role played by the quality of the cluster labels is emphasized.
This paper provides an overview of the Rhythm Patterns feature set and demonstrates its suitability for music genre recognition, and outlines the principles of organizing digital music repositories using Self-Organizing Maps and presents the novel PlaySOM interface and the PocketSOMPlayer for mobile devices, both providing intuitively explorable music information spaces.
The folksonomy tag cloud: when is it useful?
An experiment, giving participants the option of using a tag cloud or a traditional search interface to answer various questions, found that where the information-seeking task required specific information, participants preferred the search interface.
Bringing Mobile Map-Based Access to Digital Audio to the End User
This paper shows alternative ways of interacting with large music collections, based on the Self-Organising Map clustering algorithm applied to an audio feature representation of audio files, with the goal of bringing Music Information Retrieval technologies closer to end users.
Faceted Search
This lecture explores the history, theory, and practice of faceted search, and offers a self-contained treatment of the topic, with an extensive bibliography for those who would like to pursue particular aspects in more depth.
Graph-based multimodal clustering for social multimedia
The proposed approach utilizes an example relevant clustering in order to learn a model of the “same cluster” relationship between a pair of items and is applied on two problems that are typically treated using clustering techniques; in particular, the problem of detecting social events and of discovering different landmark views in collections of social multimedia.
Audio Set: An ontology and human-labeled dataset for audio events
The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.
Efficient k-nearest neighbor graph construction for generic similarity measures
N-Descent is presented, a simple yet efficient algorithm for approximate K-NNG construction with arbitrary similarity measures that typically converges to above 90% recall with each point comparing only to several percent of the whole dataset on average.
2017. CNN architectures for large-scale audio classification. In 2017 ieee international conference on acoustics, speech and signal processing (icassp)
  • 2017